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Abstract 

How  large  are  the  benefits  of  transportation  infrastructure  projects,  and  what  explains  these 
benefits?  To  shed  new  light  on  these  questions,  I  collect  archival  data  from  colonial  India 
and  use  it  to  estimate  the  impact  of  India’s  vast  railroad  network.  Guided  by  six  predictions 
from  a  general  equilibrium  trade  model,  I  find  that  railroads:  (1)  decreased  trade  costs  and 
interregional  price  gaps;  (2)  increased  interregional  and  international  trade;  (3)  eliminated  the 
responsiveness  of  local  prices  to  local  productivity  shocks  (but  increased  the  transmission  of 
these  shocks  between  regions);  (4)  increased  the  level  of  real  income  (but  harmed  neighboring 
regions  without  railroad  access);  (5)  decreased  the  volatility  of  real  income;  and  (6),  a  suffi¬ 
cient  statistic  for  the  effect  of  railroads  on  welfare  in  the  model  accounts  for  virtually  all  of 
the  observed  reduced-form  impact  of  railroads  on  real  income.  I  find  similar  results  from  an 
instrumental  variable  specification,  no  spurious  effects  from  over  40,000  km  of  lines  that  were 
approved  but  never  built,  and  tight  bounds  on  the  estimated  impact  of  railroads.  These  results 
suggest  that  transportation  infrastructure  projects  can  improve  welfare  significantly,  and  do 
so  because  they  allow  regions  to  exploit  gains  from  trade. 
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1  Introduction 


In  2007,  almost  20  percent  of  World  Bank  lending  was  allocated  to  transportation  infrastruc¬ 
ture  projects,  a  larger  share  than  that  of  education,  health  and  social  services  combined  (World 
Bank  2007).  These  projects  aim  to  reduce  trade  costs.  In  prominent  models  of  international  and 
interregional  trade,  reductions  in  trade  costs  will  increase  the  level  of  real  income  in  trading  regions]^] 
A  related  body  of  theoretical  work  argues  that  trade  cost  reductions  can  change  the  volatility  of 
real  income.  This  is  a  second  welfare  effect  that  may  be  especially  important  in  predominantly 
agricultural,  low-income  economies,  but  the  theoretical  predictions  from  this  work  are  less  clear- 
cut]^]  Unfortunately,  despite  an  emphasis  on  reducing  trade  costs  in  both  economic  theory  and 
contemporary  aid  efforts,  we  lack  a  rigorous  empirical  understanding  of  the  extent  to  which  trans¬ 
portation  infrastructure  projects  actually  reduce  the  costs  of  trading,  and  how  the  resulting  trade 
cost  reductions  affect  welfare. 

In  this  paper  I  exploit  one  of  history’s  great  transportation  infrastructure  projects — the  vast 
network  of  railroads  built  in  colonial  India  (India,  Pakistan  and  Bangladesh;  henceforth,  simply 
‘India’) — to  make  three  contributions  to  our  understanding  of  transportation  infrastructure  im¬ 
provements.  In  doing  so  I  draw  on  a  comprehensive  new  dataset  on  the  Indian  economy  that  I 
have  constructed.  First,  I  estimate  the  extent  to  which  railroads  improved  India’s  trading  environ¬ 
ment  (ie  reduced  trade  costs,  reduced  interregional  price  gaps,  increased  trade  flows,  and  promoted 
market  integration).  Second,  I  estimate  the  reduced-form  welfare  gains  (higher  real  income  levels 
and  lower  real  income  volatility)  that  the  railroads  brought  about.  Finally,  I  assess,  in  the  context 
of  a  general  equilibrium  trade  model,  how  much  of  these  reduced-form  welfare  gains  were  newly 
exploited  gains  from  trade. 

The  railroad  network  designed  and  built  by  the  British  government  in  India  (then  referred  to  as 
‘the  Raj’)  brought  dramatic  change  to  the  technology  of  trading  there.  Prior  to  the  railroad  age, 
bullocks  carried  most  of  India’s  commodity  trade  on  their  backs,  traveling  no  more  than  30  km 
per  day  along  India’s  sparse  network  of  dirt  roads  (Deloche  1994).  By  contrast,  railroads  could 
transport  these  same  commodities  600  km  in  a  day,  and  at  much  lower  per  unit  distance  freight 
rates.  As  the  67,247  km  long  railroad  network  expanded  (from  1853  to  1930),  it  penetrated  inland 
districts  (local  administrative  regions),  bringing  them  out  of  near-autarky  and  connecting  them 
with  the  rest  of  India  and  the  world.  I  use  the  arrival  of  the  railroad  network  in  each  district  to 
investigate  the  economic  impact  of  this  striking  improvement  in  transportation  infrastructure. 

This  setting  is  unique  because  the  British  government  collected  detailed  records  of  economic  ac- 

1Workhorse  trade  models  featuring  trade  costs  include  Dornbusch,  Fischer,  and  Samuelson  (1977),  Krugman 
(1980),  Eaton  and  Kortum  (2002)  and  Melitz  (2003).  In  all  of  these  theories,  two  trading  regions  will  both  gain 
when  the  (iceberg)  cost  of  trading  goods  between  them  falls  symmetrically.  However,  there  are  theoretical  settings 
in  which  symmetric  trade  cost  reductions  can  harm  one  of  two  trading  regions;  for  example,  if  increasing  returns  to 
scale  production  technologies  mix  with  factor  mobility,  as  in  Krugman  (1991),  or  traded  intermediate  goods,  as  in 
Krugman  and  Venables  (1995),  then  one  of  two  trading  regions  can  experience  a  welfare  loss. 

2For  example,  Newbery  and  Stiglitz  (1981)  present  models  in  which  openness  to  trade  can  either  increase  or 
decrease  real  income  volatility. 
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tivity  throughout  India  in  this  time  period — remarkably,  however,  these  records  have  never  been 
systematically  digitized  and  organized  by  researchers.  I  use  these  records  to  construct  a  new  dataset 
with  almost  seven  million  observations  on  district-level  prices,  output,  daily  rainfall  and  interre¬ 
gional  and  international  trade  in  India,  as  well  as  a  digital  map  of  India’s  railroad  network  in  which 
each  20  km  segment  is  coded  with  its  year  of  opening.  This  dataset  allows  me  to  track  the  evolution 
of  India’s  district  economies  before,  during  and  after  the  expansion  of  the  railroad  network.  The 
records  on  interregional  trade  are  particularly  unique  and  important  here.  Information  on  trade 
flows  within  a  country  is  rarely  available  to  researchers,  yet  the  response  of  these  trade  flows  to 
a  transportation  infrastructure  improvement  says  a  great  deal  about  the  potential  for  gains  from 
trade  (as  I  describe  explicitly  below). 

To  guide  my  empirical  analysis  I  develop  a  multi-region,  multi-commodity,  Ricardian  trade  model, 
where  trade  occurs  at  a  cost.  Because  of  geographical  heterogeneity,  regions  have  differing  produc¬ 
tivity  levels  across  commodities,  which  creates  incentives  to  trade  to  exploit  comparative  advantage. 
A  new  railroad  link  between  two  districts  lowers  their  bilateral  trade  cost,  allowing  consumers  to 
buy  goods  from  the  cheapest  district,  and  producers  to  sell  more  of  what  they  are  best  at  producing. 
There  are  thousands  of  interacting  product  and  factor  markets  in  the  model.  But  the  analysis  of 
this  complex  general  equilibrium  problem  is  tractable  if  production  heterogeneity  takes  a  convenient 
but  plausible  functional  form,  as  shown  by  Eaton  and  Kortum  (2002). 

I  use  this  model  to  assess  empirically  the  importance  of  one  particular  mechanism  linking  rail¬ 
roads  to  welfare  improvements — that  railroads  reduce  trade  costs  and  thereby  allow  regions  to  gain 
from  trade.  The  model  makes  six  predictions  that  drive  my  six-step  empirical  analysis: 

1.  Inter- district  price  differences  are  equal  to  trade  costs  (in  special  cases):  That  is,  if  a  com¬ 
modity  can  be  made  in  only  one  district  (the  ‘origin’)  but  is  consumed  in  other  districts  (‘des¬ 
tinations’),  then  that  commodity’s  origin-destination  price  difference  is  equal  to  its  origin- 
destination  trade  cost.  I  use  this  result  to  infer  trade  costs  (which  researchers  never  fully 
observe)  by  exploiting  widely-traded  commodities  that  could  only  be  made  in  one  district. 
Using  inter-district  price  differentials,  along  with  a  graph  theory  algorithm  embedded  in  a 
non-linear  least  squares  routine,  I  estimate  the  trade  cost  parameters  governing  traders’  en¬ 
dogenous  route  decisions  on  a  network  of  roads,  rivers,  coasts  and  railroads.  This  is  a  novel 
method  for  inferring  trade  costs  in  networked  settings.  My  resulting  parameter  estimates 
reveal  that  railroads  significantly  reduced  the  cost  of  trading  in  India. 

2.  Bilateral  trade  flows  take  the  ‘ gravity  equation’  form:  That  is,  holding  constant  exporter-  and 
importer-specific  effects,  bilateral  trade  costs  reduce  bilateral  trade  flows.  I  find  that  railroad- 
driven  reductions  in  trade  costs  (estimated  in  Step  1)  increase  bilateral  trade  flows,  and  show 
that  the  parameters  estimated  from  the  gravity  equation  identify  my  model. 

3.  Railroads  reduce  the  responsiveness  of  prices  to  local  productivity  shocks:  That  is,  a  district’s 
prices  are  less  responsive  to  its  own  productivity  shocks  when  it  is  connected  to  the  railroad 
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network;  however,  a  district’s  prices  are  more  responsive  to  any  other  district’s  productivity 
shocks  when  these  two  districts  are  connected  by  a  railroad  line.  I  find  empirical  support 
for  both  of  these  predictions.  Specifically,  in  a  novel  test  for  market  integration,  1  find 
that  railroads  caused  a  dramatic  reduction  in  the  responsiveness  of  prices  to  local  rainfall 
shocks,  reducing  responsiveness  to  almost  zero  (even  when  focusing  purely  on  rainfall  vari¬ 
ation  across  crops,  within  a  district  and  year).  This  implies  that  railroads  brought  India’s 
district  economies  close  to  the  small  open  economy  limit  where  local  conditions  have  no  effect 
on  local  prices.  I  also  find  that  a  district’s  rainfall  shocks  affect  prices  in  neighboring  districts 
to  which  it  is  connected  by  the  railroad  network  (to  a  weak  but  statistically  significant  extent). 

4.  Railroads  increase  real  income  levels:  That  is,  when  a  district  is  connected  to  the  railroad 
network  its  real  income  rises;  however,  improvements  in  the  railroad  network  that  by-pass 
a  district  reduce  the  district’s  real  income  (a  negative  spillover  effect).  Empirically,  I  find 
that  own-railroad  access  raises  real  income  by  18  percent,  but  a  neighbor’s  access  reduces 
real  income  by  4  percent.  However,  these  are  reduced-form  estimates  that  could  be  due  to  a 
number  of  mechanisms.  A  key  goal  of  Step  6  is  to  assess  how  much  of  the  reduced-form  effect  of 
railroads  can  be  attributed  to  gains  from  trade  due  to  the  trade  cost  reductions  found  in  Step  1. 

5.  Railroads  decrease  real  income  volatility:  When  a  district  is  connected  to  the  railroad  network, 
its  real  income  is  less  responsive  to  stochastic  productivity  shocks  in  the  district  (which  reduces 
volatility).  Empirically,  I  find  that  railroads  reduced  the  responsiveness  of  real  agricultural 
income  to  local  rainfall,  which  suggests  a  second  welfare  benefit  of  transportation  infrastruc¬ 
ture  (in  addition  to  that  found  in  Step  4)  that  has  not,  to  my  knowledge,  been  demonstrated 
empirically  before.  However,  as  with  the  results  in  Step  4,  a  number  of  mechanisms  could 
underpin  this  reduced-form  result. 

6.  There  exists  a  sufficient  statistic  for  the  welfare  gains  from  railroads:  That  is,  despite  the  com¬ 
plexity  of  the  model’s  general  equilibrium  relationships,  the  impact  of  the  railroad  network  on 
welfare  in  a  district  is  captured  by  one  variable:  the  share  of  that  district’s  expenditure  that 
it  sources  from  itself.  A  prediction  similar  to  this  appears  in  a  wide  range  of  trade  models 
but  has  not,  to  my  knowledge,  been  tested  before]^]  I  test  this  prediction  by  regressing  real 
income  on  this  sufficient  statistic  (as  calculated  using  the  model  estimated  in  Steps  1  and  2) 
alongside  the  regressors  from  Steps  4  and  5  (which  capture  the  reduced-form  impact  of  rail¬ 
roads)  Q  When  I  do  this,  the  reduced-form  coefficients  on  railroad  access  estimated  in  Steps 
4  and  5  fall  to  a  level  that  is  close  to  zero.  This  finding  provides  support  for  prediction  6  of 
the  model  and  suggests  that  decreased  trade  costs  account  for  virtually  all  of  the  real  income 
impacts  of  the  Indian  railroad  network. 

3Arkolakis,  Klenow,  Demidova,  and  Rodriguez-Clare  (2008)  show  that  this  prediction  applies  to  the  Krugman 
(1980),  Eaton  and  Kortum  (2002),  Melitz  (2003),  and  Chaney  (2008)  models  of  trade,  but  these  authors  do  not  test 
this  prediction  in  their  empirical  application. 

4This  procedure  is  similar  in  spirit  to  the  “sufficient  statistic  approach”  proposed  by  Clietty  (2008)  as  a 
compromise  between  reduced-form  and  structural  methods  of  welfare  analysis. 
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These  six  results  demonstrate  that  India’s  railroad  network  improved  the  trading  environment 
(Steps  1,  2  and  3),  generated  welfare  gains  (Steps  4  and  5),  and  that  these  welfare  gains  arose 
predominantly  because  railroads  allowed  regions  to  exploit  gains  from  trade  (Step  6). 

Because  railroads  were  not  randomly  assigned  to  districts,  I  pursue  three  strategies  to  mitigate 
concerns  of  bias  due  to  a  potential  correlation  between  railroad  placement  and  unobserved  changes  in 
the  local  economic  environment.  First,  I  estimate  four  placebo  specifications  using  over  40,000  km  of 
railroad  lines  that  reached  advanced  stages  of  costly  surveying  but  were  never  actually  built,  but  find 
no  spurious  effects  from  these  unbuilt  lines.  Second,  I  estimate  instrumental  variable  specifications  in 
which  I  instrument  for  railroad  construction  post-1884  with  rainfall  shortages  in  the  1876-78  agricul¬ 
tural  years  (because  the  1880  Indian  Famine  Commission  recommended  that  railroad  lines  be  built 
in  regions  that  experienced  drought  in  the  1876-78  famine),  and  find  IV  results  that  are  very  close 
to  my  OLS  results.  Finally,  in  a  bounds  check ,  I  find  similar  results  among  railroad  lines  whose  esti¬ 
mates  are  likely  to  be  biased  upwards  and  lines  whose  estimates  are  likely  to  be  biased  downwards. 

This  paper  contributes  to  a  growing  literature  on  estimating  the  economic  effects  of  large  infras¬ 
tructure  projects]^]  as  well  as  a  literature  on  estimating  the  ‘social  savings’  of  railroad  projects]^]  A 
distinguishing  feature  of  my  approach  is  that,  in  addition  to  estimating  reduced-form  relationships 
between  infrastructure  and  welfare  as  in  the  existing  literature,  I  fully  specify  and  estimate  a  gen¬ 
eral  equilibrium  model  of  how  railroads  affect  welfare.  The  model  makes  auxiliary  predictions  and 
suggests  a  sufficient  statistic  for  the  role  played  by  railroads  in  raising  welfare — all  of  which  shed 
light  on  the  economic  mechanisms  that  could  explain  my  reduced-form  estimates.  Using  a  model 
also  improves  the  external  validity  of  my  estimates  because  the  primitive  in  my  model — the  cost  of 
trading — is  specified  explicitly,  and  is  portable  to  a  range  of  settings  in  which  the  welfare  benefits 
of  trade  cost-reducing  polices  might  be  sought  Jj]  By  contrast,  my  reduced-form  estimates  are  more 
likely  to  be  specific  to  the  context  of  railroads  in  colonial  India.  Finally,  the  model  suggests  a 
general  equilibrium  treatment  externality  of  railroads  that,  if  ignored,  would  bias  estimates  of  the 
effects  of  this  infrastructure  project  by  almost  20  percent]^]  This  point  has  not,  to  my  knowledge, 
been  incorporated  before  in  the  infrastructure  literature,  or  in  the  literature  estimating  the  welfare 

5For  example,  Dinkelman  (2007)  estimates  the  effect  of  electrification  on  labor  force  participation  in  South  Africa, 
Duflo  and  Pande  (2007)  estimate  the  effect  of  dam  construction  in  India  on  agriculture,  Jensen  (2007)  evaluates  how 
the  construction  of  cellular  phone  towers  in  South  India  improved  efficiency  in  the  fish  market,  and  Michaels  (2008) 
estimates  the  effect  of  the  US  Interstate  Highway  system  on  the  skilled  wage  premium.  An  older  literature,  beginning 
with  Aschauer  (1989),  pioneered  the  use  of  econometric  methods  in  estimating  the  benefits  of  infrastructure  projects. 

6Fogel  (1964)  first  applied  the  social  savings  methodology  to  railroads  in  the  United  States,  and  Hurd  (1983) 
performed  a  similar  exercise  for  India.  In  section  |7.7|  I  compare  my  estimates  to  those  from  using  a  social  savings 
approach. 

7For  example,  Raballand  and  Macchi  (2008)  find  in  surveys  of  African  trucking  firms  that  transportation  costs 
are  relatively  high  in  Africa  because  of  a  number  of  policy-relevant  features  (e.g.  poor  roads,  expensive  inputs,  and 
underutilized  payload  capacity).  Similarly,  Djankov,  Freund,  and  Pham  (2006)  survey  freight  forwarding  firms  in 
126  countries  to  measure  wider,  policy-relevant  costs  of  trading  (e.g.  inspections,  technical  clearance,  mandatory 
storage,  port  handling,  and  customs  clearance). 

8Most  of  the  policy  evaluation  literature  assumes  that  policy  treatments  received  by  one  unit  of  observation  do 
not  affect  outcomes  for  any  other  units  (the  “stable  unit  treatment  value  assumption,”  in  the  language  of  Rubin 
(1978).)  Heckman  and  Abbring  (2007)  survey  the  recent  literature  on  general  equilibrium  policy  evaluation. 
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effects  of  openness  to  trade J^] 

The  next  section  describes  the  historical  setting  in  which  the  Indian  railroad  network  was  con¬ 
structed  and  the  new  data  that  I  have  collected  from  that  setting.  In  section  3,  I  outline  a  model  of 
trade  in  colonial  India  and  the  model’s  six  predictions.  Sections  4  through  9  present  six  empirical 
steps  that  test  the  model’s  six  predictions  qualitatively  and  quantitatively.  Section  10  concludes. 


2  Historical  Background  and  Data 

In  this  section  I  discuss  some  essential  features  of  the  economy  in  colonial  India  and  the  data  that 
I  have  collected  in  order  to  analyze  how  this  economy  changed  with  the  advent  of  railroad  travel. 
I  go  on  to  describe  the  transportation  system  in  India  before  and  after  the  railroad  era,  and  the 
institutional  details  that  determined  when  and  where  railroads  were  built. 


2.1  New  Data  on  the  Indian  Economy,  1861-1930 

In  order  to  evaluate  the  impact  of  the  railroad  network  on  economic  welfare  in  colonial  India  I  have 
constructed  a  new  panel  dataset  on  239  Indian  districts.  The  dataset  tracks  these  districts  annually 
from  1861-1930,  a  period  during  which  98  percent  of  British  India’s  current  railroad  lines  were 
opened.  Table  1  contains  descriptive  statistics  for  the  variables  that  I  use  in  this  paper  and  describe 
throughout  this  section.  Appendix  [A]  contains  more  detail  on  the  construction  of  these  variables. 

During  the  colonial  period,  India’s  economy  was  predominantly  agricultural,  with  agriculture 
constituting  an  estimated  66  percent  of  GDP  in  1900  (Heston  1983)]^]  For  this  reason,  district- level 
output  data  was  only  collected  systematically  in  the  agricultural  sector.  Data  on  agricultural  out¬ 
put  was  recorded  for  each  of  17  principal  crops  (which  comprise  93  percent  of  the  cropped  area  of 
India  in  1900).  Retail  prices  for  these  17  crops  were  also  recorded  at  the  district-level.  I  use  these 
price  figures  to  construct  a  nominal  agricultural  GDP  series  for  each  district  and  year  and  then 
a  real  agricultural  income  per  acre  figure  by  dividing  by  a  consumer  price  index  and  district  land 
areaj^j]  The  resulting  real  agricultural  income  per  acre  variable  provides  the  best  available  measure 
of  district-level  economic  welfare  in  this  time  period. 

Real  incomes  were  low  during  my  sample  period,  but  there  was  22  percent  growth  between  1870 
and  1930 j^]  Real  incomes  were  low  because  crop  yields  were  low,  both  by  contemporaneous  interna- 

9Frankel  and  Romer  (1999),  Rodriguez  and  Rodrik  (2001),  Irwin  and  Tervio  (2002),  and  Alcala  and  Ciccone 
(2004)  are  leading  contributions  to  this  literature.  But  the  empirical  approach  used  in  these  papers  assumes  that 
one  country’s  openness  does  not  affect  welfare  in  any  other  country. 

10Factory-based  industry — which  Chandler  (1977)  and  Atack,  Haines,  and  Margo  (2008)  argue  benefited  from 
access  to  railroads  in  the  United  States — contributed  only  1.6  percent  of  India’s  GDP  in  1900. 

nThe  use  of  real  income,  rather  than  real  GDP,  in  open-economy  settings  is  advocated  by  Diewert  and  Morrison 
(1986),  Feenstra  (2003)  and  Kehoe  and  Ruhl  (2008).  As  the  latter  authors  argue,  real  income  captures  the  gains 
from  trade  in  a  wide  range  of  trade  models,  but  real  GDP  does  not. 

12For  comparison,  Heston  (1983)  estimates  that  in  1869,  on  the  basis  of  purchasing  power  exchange  rates,  per 
capita  income  in  the  United  States  was  four  times  that  in  India.  This  income  disparity  rises  to  ten  if  market 
exchange  rates  are  used  instead  of  PPP  rates. 
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tional  standards  and  by  Indian  standards  todayp*]  One  explanation  for  low  yields,  which  featured 
heavily  in  Indian  agricultural  textbooks  of  the  day  (such  as  Leake  (1923),  Mollison  (1901)  and 
Wallace  (1892)),  was  inadequate  water  supply.  Only  12  percent  of  cultivated  land  was  irrigated  in 
1885;  while  this  figure  had  risen  to  19  percent  in  1930,  the  vast  majority  of  agriculture  maintained 
its  dependence  on  rainfallf~i] 

Because  rainfall  was  important  for  agricultural  production,  3614  meteorological  stations  (plotted 
in  Figure  1)  were  built  throughout  the  country  to  record  the  amount  of  rainfall  at  each  station  on 
every  day  of  the  year.  Daily  rainfall  data  was  recorded  and  published  because  the  distribution  of 
rainfall  throughout  the  year  was  far  more  important  to  farmers  and  traders  than  total  annual  or 
monthly  amounts.  In  particular,  the  intra-annual  distribution  of  rainfall  governed  how  different 
crops  (which  were  grown  in  distinct  stretches  of  the  year)  were  affected  by  a  given  year’s  rainfall. 
In  sections  [5]  through  [9j  I  use  daily  rainfall  data  from  each  of  India’s  3614  meteorological  stations  to 
construct  crop-specific  measures  of  rainfall,  in  order  to  provide  exogenous  variation  in  crop-specific 
productivity. 

Rainfall  was  extremely  volatile  from  year-to-year,  giving  rise  to  the  common  description  of  colo¬ 
nial  Indian  agriculture  as  “a  gamble  in  monsoons.”^  Like  rainfall,  prices,  nominal  agricultural 
incomes,  and  real  agricultural  incomes  were  also  volatile  over  time  within  districts.  The  clearest 
manifestation  of  this  volatility  appeared  in  India’s  11  official  famines  between  1860  and  1930,  in 
which  at  least  15  million  people  cliedj^]  Even  beneath  these  extreme  events  laid  significant  real 
income  volatility.  I  investigate  the  role  that  railroads  played  in  reducing  this  volatility  in  section  [8j 


2.2  Transportation  in  Colonial  India 

Prior  to  the  railroad  era,  goods  transport  within  India  took  place  on  roads,  rivers,  and  coastal  ship¬ 
ping  routes.  The  bulk  of  inland  travel  was  carried  by  bullocks,  along  the  road  network p]  Bullocks 
were  employed  either  as  ‘pack  bullocks’  (which  carried  goods  strapped  to  their  backs  and  usually 
traveled  directly  over  pasture  land),  or  ‘cart  bullocks’  (which  pulled  a  cart  containing  goods  and 
traveled  along  improved  roads).  On  the  best  road  surfaces  and  during  optimal  weather  conditions, 
cart  bullocks  could  cover  20-30  km  per  day.  However,  high-quality  roads  were  extremely  sparse  and 

13For  example,  the  yield  of  wheat  in  India’s  ‘breadbasket’,  the  province  of  Punjab,  was  748  lbs/acre  in  1896.  By 
contrast,  for  similar  types  of  wheat,  yields  in  Nevada  (the  highest  state  yields  in  the  United  States)  in  1900  were 
almost  twice  as  high  (see  plate  15  of  United  States  Census  Office  (1902))  and  yields  in  (Indian)  Punjab  in  2005  were 
over  live  times  higher  than  those  in  1896  (as  calculated  from  the  Indian  District-wise  Crop  Production  Statistics 
Portal ,  http : // dacnet . nic . in/ apy/cps . aspx ) . 

14These  figures  encompass  a  wide  definition  of  irrigation,  including  the  use  of  tanks,  cisterns,  and  reservoirs  as 
well  as  canals.  See  the  Agricultural  Statistics  of  India,  described  in  Appendix  [A]  1885  is  the  first  year  in  which 
comprehensive  irrigation  statistics  were  collected. 

15See,  for  example,  Gadgil,  Rajeevan,  and  Francis  (2007).  The  phrase  is  still  used  to  refer  to  agriculture  today — for 
example,  in  the  state  of  Orissa’s  2005  Economic  Survey ,  a  state  in  which  only  56  percent  of  cultivated  land  is  irrigated. 

16I  calculate  this  figure  from  Appendix  Table  5.2  of  Visaria  and  Visaria  (1983).  Davis  (2001)  argues  that  this  is  a 
gross  underestimate  and  suggests  an  upper  estimate  of  34  million  deaths.  In  particular,  Visaria  and  Visaria  (1983) 
do  not  include  any  deaths  from  the  severe  famine  of  1899-1900,  for  want  of  data. 

17Camels  were  also  used  in  sandy  areas.  Horses,  ponies,  donkeys,  mules  and  elephants  were  less  common  forms 
of  animal-powered  transportation. 
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the  roads  that  did  exists  were  virtually  impassable  in  the  monsoon  season  (Deloche  1994).  Pack 
bullocks  were  more  versatile  than  cart  bullocks,  but  their  freight  rates  were  three  times  higher  per 
unit  distance  and  weight  (Derbyshire  1985). 

Water  transport  was  far  superior  to  road  transport,  but  it  was  only  feasible  on  the  Brahmaputra, 
Ganges  and  Indus  river  systems]^]  In  optimal  conditions,  downstream  river  traffic  (with  additional 
oar  powei 19 )  could  cover  65  km  per  day;  upstream  traffic  needed  to  be  towed  from  the  banks  and 
struggled  to  cover  15  km  per  day.  Extensive  river  travel  was  impossible  in  the  rainy  monsoon 
months,  or  the  dry  summer  months,  and  piracy  was  a  serious  hazard  (Deloche  1995).  Coastal 
shipping,  however,  was  perennially  available  along  India’s  long  coastline.  This  form  of  shipping  was 
increasingly  steam-powered  post-1840.  Steamships  were  fast,  covering  over  100  km  per  day,  but 
they  could  only  service  major  ports.  The  bulk  of  this  trade,  both  before  and  after  the  railroads, 
therefore  consisted  of  shipments  between  the  major  ports  (Naidu  1936). 

Against  this  backdrop  of  costly  and  slow  internal  transportation,  the  appealing  prospect  of  rail¬ 
road  transportation  in  India  was  discussed  as  early  as  1832  (Sanyal  1930) — though  it  was  not  until 
1853  that  the  first  track  was  actually  laid.  From  the  outset,  railroad  transport  proved  to  be  far 
superior  to  road,  river  or  coastal  transport  (Banerjee  1966).  Railroads  were  capable  of  traveling 
up  to  600  km  per  day  and  they  offered  this  superior  speed  on  predictable  timetables,  throughout 
all  months  of  the  year,  without  any  risk  of  piracy  (Johnson  1963).  Railroad  freight  rates  were 
also  considerably  cheaper:  4-5,  2-4,  and  1.5-3  times  cheaper  than  road,  river  and  coastal  travel, 
respectively  (Deloche  1994,  Deloche  1995,  Derbyshire  1985,  Hurd  1975). 


2.3  Railroad  Line  Placement  Decisions 

Throughout  the  history  of  India’s  railroads,  all  railroad  line  placement  decisions  were  made  by 
the  Government  of  India.  It  is  widely  accepted  that  the  Government  had  three  motives  for  build¬ 
ing  railroads:  military,  commercial,  and  humanitarian — in  that  order  of  priority  (Thorner  1950, 
Macpherson  1955,  Headrick  1988).  In  1853,  Lord  Dalhousie  (head  of  the  Government  of  India) 
wrote  an  internal  document  to  the  East  India  Company’s  Court  of  Directors  that  sketched  rail¬ 
road  policy  in  India  for  decades  to  come.  Military  motivations  for  railroad-building  appeared  on 
virtually  every  page  of  this  document and  these  motivations  gained  new  momentum  when  the 
1857  ‘mutiny’  highlighted  the  importance  of  military  communications  (Headrick  1988).  Dalhousie’s 
minute  described  five  ‘trunk  lines’  that  would  connect  India’s  five  major  provincial  capitals  along 
direct  routes  and  maximize  the  “political  advantages”  of  a  railroad  network. 

18Navigable  canals  either  ran  parallel  to  sections  of  these  three  rivers  or  were  extremely  localized  in  a  small 
number  of  coastal  deltas  (Stone  1984,  Whitcombe  1983). 

19  Steamboats  had  periods  of  success  in  the  colonial  era,  but  were  severely  limited  in  scope  by  India’s  seasonal  and 
shifting  rivers  (Derbyshire  1985). 

20For  example,  from  the  introduction:  “A  single  glance... will  suffice  to  show  how  immeasurable  are  the  political 
advantages  to  be  derived  from  the  system  of  internal  communication,  which  would  admit  of  full  intelligence  of 
every  event  being  transmitted  to  the  Government. ..and  would  enable  the  Government  to  bring  the  main  bulk  of  its 
military  strength  to  bear  upon  any  given  point  in  as  many  days  as  it  would  now  require  months,  and  to  an  extent 
which  at  present  is  physically  impossible.”  (House  of  Commons  Papers  1853). 
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Between  1853  and  1869,  all  of  Dalhousie’s  trunk  lines  were  built — but  not  without  significant 
debate  over  how  best  to  connect  the  provincial  capitals.  Dalhousie  and  Major  Kennedy,  India’s 
Chief  Engineer,  spent  over  a  decade  discussing  and  surveying  (at  great  cost)  their  competing, 
but  very  different,  proposals  for  a  pan-Indian  network  (Davidson  1868,  Settar  1999).  This  debate 
indicates  the  vicissitudes  of  railroad  planning  in  India  and  it  was  repeated  many  times  by  different 
actors  in  Indian  railroad  history.  I  have  collected  planning  documents  from  a  number  of  railroad 
expansion  proposals  that,  like  Kennedy’s  proposal,  were  debated  and  surveyed  at  length,  but  were 


never  actually  built.  As  discussed  in  section  7.4,  I  use  these  plans  in  a  placebo  strategy  to  check  that 
unbuilt  lines  display  no  spurious  ‘impact’  on  the  district  economics  in  which  they  were  nearly  built. 

By  1876,  railroad  expansion  had  slowed  significantly  in  India.  But  railroads  benefited  from  new 
enthusiasm  in  the  wake  of  the  1880  Famine  Commission,  which  recommended  railroads  as  a  means 
for  future  famine  prevention.  The  Commission’s  recommendations  for  specific  railroad  lines  formed 


the  bedrock  on  which  more  detailed  plans  over  the  ensuing  15  years  were  built.  In  section  7. 5  I  de¬ 
scribe  how  this  motivates  an  instrumental  variable  for  railroad  construction.  A  second  consequence 
of  the  1880  Famine  Commission  report,  from  the  perspective  of  my  identification  strategy  in  section 


7.6  is  that  all  railroad  proposals  from  1883  to  1904  were  required  to  be  designated  according  to 


their  intended  purpose.  I  use  this  feature  to  motivate  a  set  of  bounds  on  my  estimates  of  the  average 
effect  of  railroads. 

As  is  clear  from  Figure  2,  the  railroad  network  in  place  in  1930  (by  and  large,  the  same  network 
that  is  open  today)  had  completely  transformed  the  transportation  system  in  India.  67,247  km 
of  track  were  open  for  traffic,  constituting  the  fourth- largest  network  in  the  world.  From  their 
inception  in  1853  to  their  zenith  in  1930,  railroads  were  the  dominant  form  of  public  investment  in 
British  IndiaJ^]  But  influential  observers  were  highly  critical  of  this  public  investment  priority — the 
Nationalist  historian,  Romesli  Dutt,  argued  that  they  did  little  to  promote  agricultural  develop¬ 
ment  Q  and  Mahatma  Gandhi  argued  simply  that  “there  can  be  little  doubt  that  they  [railroads] 
promote  evil.”[^]  In  the  remainder  of  this  paper  I  use  new  data  to  assess  quantitatively  the  effect 
of  railroads  on  India’s  trading  environment  and  agricultural  economy. 


21Kumar  (1983)  summarizes  government  revenues  and  expenditure  in  India.  Public  investment  (which  included 
railroads,  roads,  irrigation,  buildings,  health  and  education)  accounted  for  around  20  percent  of  total  expenditure 
(the  largest  category  behind  defense)  and  railroads  accounted  for  over  40  percent  of  this  category. 

22For  example,  on  page  174  of  his  textbook  on  Indian  economic  history:  “Railways... did  not  add  to  the  produce 
of  the  land.”  (Dutt  1904) 

23From  Chapter  IX  of  the  1938  English  translation  of  Gandhi’s  1909  Hind  Swaraj  [Indian  Home  Rule],  his 
influential  newspaper  columns.  Other  passages  are  equally  polemic:  “...but  for  the  railways,  the  English  could  not 
have  such  a  hold  on  India  as  they  have.  The  railways,  too,  have  spread  the  bubonic  plague... Railways  have  also 
increased  the  frequency  of  famines,  because,  owing  to  facility  of  means  of  locomotion,  people  sell  out  their  grain, 
and  it  is  sent  to  the  dearest  markets.... They  [railways]  accentuate  the  evil  nature  of  man.  Bad  men  fulfil  their  evil 
designs  with  greater  rapidity.”  (Gandhi  1938) 


3  A  Model  of  Railroads  and  Trade  in  Colonial  India 


In  this  section  I  develop  a  general  equilibrium  model  of  trade  among  many  regions  in  the  presence  of 
trade  costs.  The  model  is  based  on  Eaton  and  Kortum  (2002),  extended  to  a  setting  with  more  than 
one  commodity;  this  extension  allows  me  to  generate  cross-commodity  predictions  that  exploit  the 
full  richness  of  my  commodity-level  dataj^]  The  model  serves  two  purposes.  First,  it  delivers  six  pre¬ 
dictions  about  the  response  of  observables  to  trade  cost  reductions.  Second,  I  estimate  the  model  and 
use  it  to  assess  whether  the  observed  reduction  in  trade  costs  due  to  the  railroads  can  account,  via  the 
mechanism  stressed  in  this  model,  for  the  observed  increase  in  welfare  due  to  railroads.  Both  of  these 
features  inform  our  understanding  of  how  transportation  infrastructure  projects  can  raise  welfare. 

3.1  Model  Environment 

The  economy  consists  of  D  regions  (indexed  by  either  o  or  d).  There  are  K  commodities  (indexed 
by  k),  each  available  in  a  continuum  (with  mass  normalized  to  one)  of  horizontally  differentiated 
varieties  (indexed  by  j).  In  my  empirical  application  I  work  with  data  on  prices,  output  and  trade 
flows  that  refer  to  commodities,  not  individual  varieties.  While  my  empirical  setting  will  consider  70 
years  of  annual  observations,  for  simplicity  the  model  is  static;  I  therefore  suppress  time  subscripts 
until  they  are  necessary. 

Consumer  Preferences: 

Each  region  o  is  home  to  a  mass  (normalized  to  one)  of  identical  agents,  each  of  whom  owns  L0 
units  of  land.  Land  is  geographically  immobile  and  supplied  inelastically.  Agents  have  Cobb- 
Douglas  preferences  over  commodities  (k)  and  constant  elasticity  of  substitution  preferences  over 
varieties  (j)  within  each  commodity;  that  is,  their  (log)  utility  function  is 

In  U0  =  J2  (f)  ln  AcJti))'**  (1) 

k=\  k  fc  /  7o 

where  C%(j)  is  consumption,  ek  =  (where  ak  is  the  (constant)  elasticity  of  substitution),  and 
JOk/ifc  =  1.  Agents  rent  out  their  land  at  the  rate  of  ra  per  unit  and  use  their  income  r0L0  to 
maximize  utility  from  consumption. 

Production  and  Market  Structure: 

Each  variety  j  of  the  commodity  k  can  be  produced  using  a  constant  returns  to  scale  production 
technology  in  which  land  is  the  only  factor  of  production J^]  Let  z%(j)  denote  the  amount  variety  j 

24 While  Eaton  and  Kortum  (2002)  use  a  continuum  of  varieties  (j  in  the  notation  I  use  below),  these  are  all 
varieties  from  one  commodity  ( k  in  the  notation  I  use  below).  The  main  predictions  of  their  model,  and  the  model 
that  I  present  here,  are  at  the  level  of  commodities  rather  than  varieties. 

25It  is  straightforward  to  extend  this  setting  to  an  arbitrary  number  of  immobile  factors  (provided  that  the 
stochastic  productivity  term  remains  Hicks-neutral)  by  replacing  the  land  rental  rate  rD  with  a  general  unit  cost 
function  cfc(wQ),  where  w0  is  the  vector  of  factor  payments  in  region  o. 
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of  commodity  k  that  can  be  produced  with  one  unit  of  land  in  region  o.  1  follow  Eaton  and  Kortum 
(2002)  in  modeling  zk(j)  as  the  realization  of  a  stochastic  variable  Zk  drawn  from  a  Type-11  extreme 
value  distribution  whose  parameters  vary  across  regions  and  commodities;  that  is, 

Fk(z)  =  Pr (ZkQ  <z)  =  exp (-Ak0z~dk)  (2) 


where  Ak  >  0  and  6k  >  0.  These  random  variables  are  drawn  independently  for  each  variety, 
commodity  and  region.26  The  exogenous  parameter  Ak0  increases  the  probability  of  high  produc¬ 


tivity  draws  and  the  exogenous  parameter  6k  captures  (inversely)  how  variable  the  productivity  of 
commodity  k  in  any  region  is  around  its  average. 

There  are  many  competitive  firms  in  region  o  with  access  to  the  above  technology;  conse¬ 
quently,  firms  make  zero  profits]^]  These  firms  will  therefore  charge  a  pre-trade  costs  price  of 
PooU)  =  ro/Zo(j)i  where  ra  is  the  land  rental  rate  in  region  o. 


Opportunities  to  Trade: 

Without  opportunities  to  trade,  consumers  in  region  d  must  consume  even  their  region’s  worst  draws 
from  the  productivity  distribution  in  equation  (|2j) .  The  ability  to  trade  breaks  this  production- 
consumption  link.  This  allows  consumers  to  import  varieties  from  other  regions  in  order  to  take 
advantage  of  the  favorable  productivity  draws  available  there,  and  allows  producers  to  produce 
more  of  the  varieties  for  which  they  received  the  best  productivity  draws.  These  two  mechanisms 
constitute  the  gains  from  trade  in  this  model. 

However,  there  is  a  limit  to  trade  because  the  movement  of  goods  is  subject  to  trade  costs  (which 
include  transport  costs  and  other  barriers  to  trade).  These  trade  costs  take  the  convenient  and 
commonly  used  Samuelson  (1954)  ‘iceberg’  form.  That  is,  in  order  for  one  unit  of  commodity  k 
to  arrive  in  region  d,  Tkd  >  1  units  of  the  commodity  must  be  produced  and  shipped  in  region  o; 
trade  is  free  when  Tkd  =  1.  (Throughout  this  paper  1  refer  to  trade  flows  between  an  origin  region 
o  and  a  destination  region  d;  all  bilateral  variables,  such  as  Tkd,  refer  to  quantities  fro7n  o  to  d.) 

26The  assumption  of  within-sector  heterogeneity  characterized  by  a  continuous  stochastic  distribution  of  produc¬ 
tivities  is  a  standard  feature  in  the  literature  on  trade  with  heterogeneous  firms  (eg  Melitz  (2003)).  It  is  common  in 
that  literature  to  assume  that  the  productivity  distribution  is  Pareto  (to  which  the  upper  tail  of  a  Type-II  extreme 
value  distribution  converges)  and  that  productivities  are  drawn  independently  across  varieties  (firms),  commodities, 
and  countries  (eg  Melitz  and  Ottaviano  (2007),  Chaney  (2008)  and  Helpman,  Melitz,  and  Rubinstein  (2008).)  An 
attraction  of  the  Type-II  extreme  value  distribution  is  its  plausible  micro-foundations:  Kortum  (1997)  applies  the 
extremal  types  theorem  to  show  that  the  distribution  of  productivities  among  producers  who  use  only  the  highest 
draws  from  any  iid  process  of  ‘ideas’  will  converge  to  an  extreme  value  distributional  form.  Nevertheless,  Costinot 
and  Komunjer  (2008)  show  that  the  key  features  of  the  Eaton  and  Kortum  (2002)  model  hold  locally  around  a 
symmetric  distribution  of  exogenous  productivity  terms  for  any  continuous  productivity  distribution. 

2 '  My  empirical  application  is  primarily  to  the  agricultural  sector.  This  sector  was  characterized  by  millions  of  small¬ 
holding  farmers  who  were  likely  to  be  price-taking  producers  of  undifferentiated  products  (varieties  j  in  my  model) . 
For  example,  in  the  1901  census  in  the  province  of  Madras,  workers  in  the  agricultural  sector  (67.9  percent  of  the 
almost  20  million  strong  workforce)  were  separately  enumerated  by  their  ownership  status,  and  35.7  percent  of  these 
workers  were  owner-cultivators  (extremely  small-scale  farms)  (Risley  and  Gait  1903).  Nevertheless,  Bernard,  Eaton, 
Jensen,  and  Kortum  (2003)  and  Eaton,  Kortum,  and  Kramarz  (2005)  extend  the  Eaton  and  Kortum  (2002)  framework 
to  allow  for  Bertrand  and  monopolistic  competition,  respectively.  While  in  principle  it  is  possible  to  estimate  these 
alternative  models,  the  most  natural  way  to  do  so  uses  firm-level  trade  data,  which  is  unavailable  in  my  setting. 
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Trade  costs  are  assumed  to  satisfy  the  property  that  it  is  always  (weakly)  cheaper  to  ship  directly 
from  region  o  to  region  d,  rather  than  via  some  third  region  m:  that  is,  Tkd  <  TkmT^d.  Finally, 
1  normalize  Tka  =  1.  In  my  empirical  setting  I  proxy  for  Tkd  with  measures  calculated  from  the 
observed  transportation  network,  which  incorporates  all  possible  modes  of  transport  between  region 
o  and  region  d.  Railroads  enter  this  transportation  network  gradually  over  time,  reducing  Tkd  and 
creating  more  gains  from  trade. 

Trade  costs  drive  a  wedge  between  the  price  of  an  identical  variety  in  two  different  regions.  Let 
PodU)  denote  the  price  of  variety  j  of  commodity  k  produced  in  region  o,  but  shipped  to  region  d 
for  consumption  there.  The  iceberg  formulation  of  trade  costs  implies  that  any  variety  in  region  d 
will  cost  Tkd  times  more  than  in  region  o;  that  is,  pkod(j )  =  Tkdpk0(j)  =  ra  Tkd/zk(j). 


Equilibrium  Prices  and  Allocations: 

Consumers  have  preferences  for  all  varieties  j  along  the  continuum  of  varieties  of  commodity  k.  But 
they  are  are  indifferent  about  where  a  given  variety  is  made — they  simply  buy  from  the  region  that 
can  provide  the  variety  at  the  lowest  cost.  I  therefore  solve  for  the  equilibrium  prices  that  consumers 
in  a  region  d  actually  face,  given  that  they  will  only  buy  a  given  variety  from  the  cheapest  source 
region  (including  their  own). 

The  price  of  a  variety  sent  from  region  o  to  region  d,  denoted  by  pkd(j),  is  stochastic  because  it 
depends  on  the  stochastic  variable  zk(j).  Since  zk (j )  is  drawn  from  the  CDF  in  equation  (J2]) ,  PodU) 
is  the  realization  of  a  random  variable  Pkd  drawn  from  the  CDF 

GUp )  =  Pr (Pei  <p)  =  1  -  exp[— A‘(r„T*  )-**/*]•  (3) 


This  is  the  price  distribution  for  varieties  (of  commodity  k)  made  in  region  o  that  could  potentially 
be  bought  in  region  d.  The  price  distribution  for  the  varieties  that  consumers  in  d  will  actually 
consume  (whose  CDF  is  denoted  by  Gd(p))  is  the  distribution  of  prices  that  are  the  lowest  among 
all  D  regions  of  the  world: 


D 


G*(P)  =  1  -  IP  -  <&(?)], 


0=1 


=  1  —  exp 


D 


-8k 


0=1 


V 


Ok 


Given  this  distribution  of  the  actual  prices  paid  by  consumers  in  region  d,  it  is  straightforward 
to  calculate  any  moment  of  the  prices  of  interest.  The  price  moment  that  is  important  for  my 
empirical  analysis  is  the  expected  value  of  the  equilibrium  price  of  any  variety  j  of  commodity  k 
found  in  region  d,  which  is  given  by 


E\Pi(i)]  =  p\ 


y 


D 


o=l 


Ak0UoTc 


odJ 


—  l/0fc 


(4) 
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In  section  |6]  I  treat  these  expected  prices  as  equal  to  the  observed  prices 


where  Af  =  T(1  +  A)F 
collected  by  statistical  agenciesj _ 

Given  the  price  distribution  in  equation  (|3]) ,  Eaton  and  Kortum  (2002)  derive  two  important 
properties  of  the  trading  equilibrium  that  carry  over  to  the  model  here.  First,  the  price  distribution 
of  the  varieties  that  any  given  origin  actually  sends  to  destination  d  (ie  the  distribution  of  prices 
for  which  this  origin  is  region  d* s  cheapest  supplier)  is  the  same  for  all  origin  regions.  This  implies 
that  the  share  of  expenditure  that  consumers  in  region  d  allocate  to  varieties  from  region  o  must  be 
equal  to  the  probability  that  region  o  supplies  a  variety  to  region  d  (because  the  price  per  variety, 
conditional  on  the  variety  being  supplied  to  d,  does  not  depend  on  the  origin).  That  is  X°d/Xd  =  71^, 
where  X°d  is  total  expenditure  in  region  d  on  commodities  of  type  k  from  region  o,  Xd  =  Y2o  X od 
is  total  expenditure  in  region  d  on  commodities  of  type  k,  and  7T^d  is  the  probability  that  region  d 
sources  any  variety  of  commodity  k  from  region  o.  Second,  this  probability  ir°d  is  given  by 


^ od 


vk 
A od 

xt 


=  {r0Tkdye«  (p: 


(5) 


where  A3  =  (Af)-^,  and  this  equation  makes  use  of  the  definition  of  the  expected  value  of  prices 
(ie  pd)  in  equation  Q. 

Equation  (J5])  characterizes  trade  flows  conditional  on  the  endogenous  land  rental  rate,  rc  (and  all 
regions’  land  rental  rates,  which  appear  inp^).  It  remains  to  solve  for  these  land  rents  in  equilibrium, 
by  imposing  the  condition  that  each  region’s  trade  is  balanced.  Region  o’s  trade  balance  equation 
requires  that  the  total  income  received  by  land  owners  in  region  o  ( r0L0 )  must  equal  the  total  value  of 
all  commodities  made  in  region  o  and  sent  to  every  other  region  (including  region  o  itself).  That  is: 


roLo  =  ^2^2Xod  =  ^2Yl  n°d  V k  Td  Ld ’ 
d  k  d  k 


(6) 


where  the  last  equality  uses  the  fact  that  (with  Cobb-Douglas  preferences)  expenditure  in  region  d 
on  commodity  k  ( Xd )  will  be  a  fixed  share  (ip.  of  the  total  income  in  region  d  (r^L^).  Each  of  the 
D  regions  has  its  own  trade  balance  equation  of  this  form.  I  take  the  rental  rate  in  the  first  region 
(rq)  as  the  numeraire  good,  so  the  equilibrium  of  the  model  is  the  set  of  D- 1  unknown  rental  rates 
rci  that  solves  this  system  of  D- 1  (non-linear)  independent  equations. 


28r(.)  is  the  Gamma  function  defined  by  T(;z)  =  /0°°  tz  1e  tdt. 

29A  second  price  moment  that  is  of  interest  for  welfare  analysis  is  the  exact  price  index  over  all  varieties  of 

commodity  k  for  consumers  in  region  d.  Given  CES  preferences,  this  is  pd  =  Jq  (pd(j))1<7kdj 


1/1  ~<Tk 


which  is  only 


well  defined  here  for  Uk  <  1  +  Ok  (a  condition  I  assume  throughout).  The  exact  price  index  is  given  by  pd  =  X^Pd, 
where  A§  =  yv  and  7fe  =  [r(fl|t+91~<T|i  .  That  is,  if  statistical  agencies  sampled  varieties  in  proportion  to  their 

weights  in  the  exact  price  index,  as  opposed  to  randomly  as  in  the  expected  price  formulation  of  equation  Q,  then 
this  will  not  jeopardize  my  empirical  procedure  because  the  exact  price  index  is  proportional  to  expected  prices. 
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3.2  Six  Predictions 


In  this  section  I  state  explicitly  six  of  the  model’s  predictions.  These  predictions  are  presented  in 
the  order  in  which  I  test  for  them  in  my  empirical  analysis  (ie  Steps  1-6). 


Prediction  1:  Price  Differences  Measure  Trade  Costs  (in  Special  Cases): 

In  the  presence  of  trade  costs,  the  price  of  identical  commodities  will  differ  across  regions.  In  gen¬ 
eral,  the  cost  of  trading  a  commodity  between  two  regions  places  only  an  upper  bound  on  their  price 
differential^  However,  in  the  special  case  of  a  homogeneous  commodity  that  can  only  be  produced 
in  one  origin  region,  equation  (J4])  predicts  that  the  (log)  price  differential  between  the  origin  o  of 
this  commodity  and  any  other  region  d  will  be  equal  to  the  (log)  cost  of  trading  the  commodity 
between  them.  That  is: 

In -  hip°  =  hiT0°d,  (7) 

where  the  commodity  label  k  is  replaced  by  o  to  indicate  that  this  equation  is  only  true  for  commodi¬ 
ties  that  can  only  be  made  in  region  o.  This  prediction  is  important  for  my  empirical  work  because 
it  allows  trade  costs  (T°d),  which  are  typically  unobserved,  to  be  inferred^  But  it  is  important  to 
note  that  this  prediction — essentially  just  free  arbitrage  over  space,  net  of  trade  costs — is  common 
to  many  models  of  spatial  equilibrium 


Prediction  2:  Bilateral  Trade  Flows  Take  the  ‘Gravity  Equation’  Form: 

Equation  (|5])  describes  bilateral  trade  flows  explicitly,  but  I  re-state  it  here  in  logarithms  for  refer¬ 
ence:  (log)  bilateral  trade  of  any  commodity  k  from  any  region  o  to  any  other  region  d  is  given  by 

In  Xkod  =  In  Afc  +  In  Ak  -  6k  In  rD  -  9k  In  Tkod  +  6k  In  pk  +  In  Xk.  (8) 

This  is  the  gravity  equation  form  for  bilateral  trade  flows:  bilateral  trade  costs  reduce  bilateral 

30This  can  be  easily  seen  in  two  simple  settings  where  bilateral  trade  costs  are  infinite,  but  bilateral  inter-regional 
price  differences  are  zero:  (i)  two  identical  autarkic  economies  will  have  a  price  differential  of  zero,  but  infinite 
trade  costs  vis-a-vis  each  other;  (ii)  two  regions  that  have  infinite  trade  costs  vis-a-vis  each  other  could  both  buy  a 
commodity  from  some  common  third  region  from  which  they  are  both  separated  by  the  same  trade  cost  (meaning 
that  they  face  the  same  price  for  this  commodity  and  therefore  have  a  price  difference  of  zero). 

31There  are  two  obstacles  to  using  inter-regional  price  differentials  to  infer  trade  costs  in  wider  settings  than 
that  employed  here.  First,  the  commodity  whose  price  is  being  compared  over  space  must  be  identical  in  the  two 
regions — for  example,  Broda  and  Weinstein  (2007)  use  barcode  data  to  illustrate  the  misleading  inferences  that 
have  been  drawn  from  comparing  prices  of  commodities  that  are  similar,  but  not  identical,  across  the  Canada-US 
border.  Second,  even  with  a  homogeneous  commodity,  only  if  two  regions  actually  trade  the  commodity  will  their 
inter-regional  price  difference  be  equal  to  their  bilateral  trade  cost.  Restricting  attention  to  a  commodity  that  is 
only  made  in  one  region  but  is  consumed  elsewhere,  as  I  do  in  this  paper,  helps  to  ensure  that  the  commodity  is 
homogeneous  and  guarantees  that  the  commodity  was  actually  traded  between  regions. 

32This  prediction  is  common  to  most  models  of  spatial  equilibrium.  A  class  of  exceptions  is  those  with  some  forms 
of  imperfect  competition  and  in  which  producers  can  charge  separate  prices  in  separate  markets,  as  in  Brander  and 
Krugman  (1983)  or  Melitz  and  Ottaviano  (2007).  However,  my  empirical  application  of  this  prediction  will  be  to 
salt,  which  was  produced  under  strict  government  license  at  a  small  number  of  locations  and  then  had  to  be  sold 
(under  conditions  of  the  license)  to  an  unrestricted  trading  community  at  the  ‘factory’  gate  (United  Provinces  of 
Agra  and  Oudh  1868).  That  is,  in  this  setting,  producers  only  charged  one  factory  gate  price. 
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trade  flows,  conditional  on  importer-  and  exporter-specific  terms J33] 


Prediction  3:  Railroads  Reduce  the  Responsiveness  of  Prices  to  Local  Productivity  Shocks: 
Unfortunately,  the  multiple  general  equilibrium  interactions  in  the  model  are  too  complex  to  admit  a 
closed-form  solution  for  the  effect  of  reduced  trade  costs  on  agricultural  prices.  To  make  progress  in 
generating  qualitative  predictions  (to  guide  my  empirical  analysis)  I  therefore  assume  a  much  simpler 
environment  for  Predictions  3-5.  I  assume:  there  are  only  three  regions  (called  X,  Y  and  Z);  there 
is  only  one  commodity  (so  I  will  dispense  with  the  k  superscripts  on  all  variables);  the  regions  are 
symmetric  in  their  exogenous  characteristics  (ie  L0  =  L  and  Aa  =  A  for  all  regions  o);  and  the  three 
regions  have  symmetric  trade  costs  with  respect  to  each  other,34  I  consider  the  comparative  statics 


from  a  local  change  around  this  symmetric  equilibrium,  where  it  is  straightforward  to  show  that: 


1.  d7?  ^  <  0:  The  responsiveness  of  prices  in  a  region  (say,  X)  to  productivity  shocks  in 

the  same  region  (ie  <  0)  is  weaker  (ie  less  negative)  when  the  region  has  low  trade  costs 
to  another  region  (say,  Y). 

2.  ^  (zal)  >  0:  The  responsiveness  of  prices  in  a  region  (say,  X)  to  productivity  shocks  in 
other  any  other  region  (say  region  Y,  so  the  price  responsiveness  of  interest  here  is  <  0)  is 
stronger  (ie  more  negative)  when  the  cost  of  trading  between  these  two  regions  (ie  Tyx)  is  low. 


Prediction  4:  Railroads  Increase  Real  Income  Levels: 

My  focus  here  (and  in  Prediction  5)  is  on  real  income  (which,  as  prediction  6  shows  explicitly,  is 
equal  to  welfare  in  this  model).  To  simplify  notation,  let  W0  represent  real  income  per  unit  land 
area  (ie  WQ  =  ,  where  Pa  is  the  aggregate  price  index  in  region  o,  defined  explicitly  in  Predic- 

Jo 

tion  6).  Then  it  is  straightforward  to  show  that  the  following  results  hold  around  the  symmetric 
three-region  equilibrium  introduced  in  Prediction  3: 

1.  <  0:  Real  income  in  a  region  (say,  X)  rises  when  the  cost  of  trading  between  that  region 
and  any  other  region  (say,  Y)  falls. 

2.  d^xz  >  0:  Real  income  in  a  region  (say,  X)  falls  when  the  cost  of  trading  between  the  two 
other  regions  (ie  Tyz )  falls. 

33 A  number  of  theoretical  trade  frameworks  also  predict  a  gravity  equation  for  trade  flows.  Examples  include 
Anderson  (1979),  Deardorff  (1998),  Helpman,  Melitz,  and  Rubinstein  (2008)  and  Chaney  (2008).  In  a  traditional 
gravity  equation,  bilateral  trade  flows  (for  each  commodity)  are  proportional  to  the  expenditure  of  the  importing 
region,  the  output  of  the  exporting  region,  and  inversely  proportional  to  the  bilateral  cost  of  trading  between  the 
two  regions.  Equation  ([5])  can  be  easily  manipulated  to  take  this  form.  However,  what  matters  for  my  empirical 
procedure  is  simply  that,  conditional  on  importer  and  exporter  fixed  effects,  bilateral  trade  costs  reduce  bilateral 
trade  flows — as  in  equation  (5j)  and  in  a  traditional  gravity  equation. 

34  An  alternative  means  ofobtaining  analytical  predictions  would  be  to  invoke  the  commonly-used  assumption 
that  one  commodity  can  be  traded  at  zero  trade  cost,  and  is  important  enough  to  be  produced  in  positive  quantities 
everywhere  and  always.  (This  equates  r0  to  the  nominal  productivity  in  this  zero  trade  cost  sector).  It  is  difficult 
to  imagine  a  commodity  that  satisfies  these  conditions  in  colonial  India. 
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These  results  suggest  that  a  reduction  in  trade  costs  in  one  part  of  the  network  is  not  good  for 
all  regions.  A  railroad  project  that  reduces  trade  costs  between  two  regions  will  raise  welfare  in 
these  two  regions;  but  this  project  will  reduce  welfare  in  the  third,  excluded  region  whose  trade 
costs  were  unaffected  by  the  project.  This  negative  effect  on  excluded  regions  arises  because  of  two 
effects:  first,  the  excluded  region’s  trading  partners’  land  rental  costs  have  increased  (because  these 
partners’  own  trade  costs  have  fallen),  which  raises  the  prices  of  the  commodities  that  the  partners 
ship  to  the  excluded  region;  and  second,  the  excluded  region  loses  demand  for  its  exports  because 
its  trading  partners  now  have  a  cheaper  supplier  (in  each  other). 


Prediction  5:  Railroads  Reduce  Real  Income  Volatility: 

This  prediction  concerns  the  effect  of  an  exogenous  change  in  productivity  on  a  region’s  real  income. 
If  the  exogenous  productivity  terms  are  stochastic  (as  in  my  empirical  setting)  then  a  reduction  in 
the  responsiveness  of  real  income  to  this  stochastic  production  technology  will  reduce  real  income 
volatility.  Around  the  three-region  symmetric  equilibrium: 

1.  >  0:  Real  income  in  a  region  (say,  X)  rises  when  its  productivity  ( Ax )  increases. 

2.  dl^Y  >  0:  The  effect  of  productivity  (Ay)  on  real  income  in  a  region  (say,  X)  falls 

when  the  cost  of  trading  between  this  region  and  any  other  region  (say,  Y)  falls. 

This  suggests  another  potential  welfare  gain  from  railroads  (to  the  extent  that  real  income  volatility 
affects  consumption  volatility  and  consumers  are  risk  averse). 


Prediction  6:  There  Exists  a  Sufficient  Statistic  for  the  Welfare  Gains  from  Railroads: 

Given  the  utility  function  in  equation  ([Tj) ,  the  indirect  utility  function  per  unit  of  land  (denoted  by 
Wa  as  in  Predictions  4  and  5)  in  region  o  is 


Wn  = 


nhi@)“  a 


(9) 


The  numerator  of  this  expression  is  nominal  income  (per  unit  land  area)  in  region  o  relative  to  the 
numeraire,  and  the  denominator  is  the  exact  consumer  price  index  across  all  commodities  k  denoted 
by  P0.  That  is,  welfare  is  equal  to  real  income.  Using  the  bilateral  trade  equation  (|5|  evaluated  at 
d  =  o,  (log)  real  income  per  unit  of  land  (defined  as  WQ  as  in  prediction  4)  can  be  re-written  as 


lnW„  =  Si  +  £gln4~£f  ln^,  (10) 

k  k 

where  fl  =  —  hi7fc.  This  result  states  that  welfare  is  a  function  of  only  two  terms:  local 

productivity  (A(j),  and  ‘openness’  (ie  the  fraction  of  region  o’s  expenditure  that  region  o  buys 
from  itself).  Because  of  the  complex  general  equilibrium  relationships  in  the  model,  the  full  vector 
of  trade  costs  (between  every  bilateral  pair  of  regions),  the  full  vector  of  productivity  terms  in  other 
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regions,  and  the  sizes  of  every  region  all  influence  welfare  in  region  o.  But  these  terms  (that  is, 
every  exogenous  variable  in  the  model  other  than  local  productivity)  affect  welfare  only  through 
their  effect  on  openness.  Put  another  way,  openness  (the  appropriately  weighted  sum  of  tt^0  terms 
over  goods  k)  is  a  sufficient  statistic  for  welfare  in  region  o,  once  local  productivity  is  controlled  for. 
If  railroads  affected  welfare  in  India  through  the  mechanism  in  the  model  (by  reducing  trade  costs, 
giving  rise  to  gains  from  trade),  then  prediction  6  states  that  they  did  so  only  by  changing  7 r°0. 


3.3  From  Theory  to  Empirics 

To  relate  the  static  model  in  section  [3]  to  my  dynamic  empirical  setting  (with  70  years  of  annual 
data)  I  take  the  simplest  possible  approach  and  assume  that  all  of  the  goods  in  the  model  can¬ 
not  be  stored,  and  that  inter-regional  lending  is  not  possible.  Furthermore,  I  assume  that  the 


stochastic  production  process  described  in  section  [TT]  is  drawn  independently  in  each  period.  These 
assumptions  imply  that  the  static  model  simply  repeats  every  period,  with  independence  of  all 
decision-making  across  time  periods.  Throughout  the  remainder  of  the  paper  I  therefore  add  the 
subscript  ‘t’  to  all  of  the  variables  (both  exogenous  and  endogenous)  in  the  model,  but  I  assume 
that  all  of  the  model  parameters  (6^,  cr*,  and  /ifc)  are  fixed  over  time. 


The  six  theoretical  predictions  outlined  in  section  332  take  a  naturally  recursive  order,  both  for 
estimating  the  model’s  parameters,  and  for  tracing  through  the  impact  of  railroads  on  welfare  in 
India.  I  follow  this  order  in  the  six  empirical  sections  that  follow  (ie  Steps  1-6).  In  Step  1,  I  evaluate 
the  extent  to  which  the  railroads  reduced  trade  costs  within  India.  To  do  this  I  use  Prediction  1  to 
relate  the  unobserved  trade  costs  term  in  the  model  (T^J  to  observed  features  of  the  transportation 
network.  In  Step  2,  I  use  Prediction  2  to  measure  how  much  the  reduced  trade  costs  found  in  Step  1 
increased  trade  in  India.  This  relationship  allows  me  to  estimate  the  unobserved  model  parameter 
d/c,  and  to  relate  the  unobserved  productivity  terms  (A^^to  rainfall,  which  is  an  exogenous  and 
observed  determinant  of  agricultural  productivity.  Steps  1  and  2  therefore  deliver  estimates  of  all 
of  the  model’s  parameters. 

In  Step  3,  I  test  Prediction  3  and  evaluate  the  extent  to  which  lower  trade  costs  reduced  price 
responsiveness  to  rainfall  shocks  (a  test  for  market  integration,  as  in  a  small  open  economy  price 
responsiveness  should  be  zero),  and  increased  the  transmission  of  rainfall  shocks  between  regions. 
In  Step  4,  I  test  Prediction  4  by  estimating  how  the  level  of  a  district’s  real  income  is  affected  when 
the  railroad  network  is  extended  to  that  district,  and  when  it  is  instead  extended  to  other  nearby 
district.  Step  5  performs  a  similar  test  in  the  context  of  Prediction  5,  on  the  volatility  of  real  income. 
However,  the  empirical  findings  in  Steps  4  and  5  are  reduced-form  in  nature  and  could  arise  through 
a  number  of  possible  mechanisms  P’1  Therefore,  in  Step  6  I  use  the  sufficient  statistic  suggested  by 


35The  productivity  terms  A°t  are  unobserved  because  they  represent  the  location  parameter  on  region  o’s  potential 
productivity  distribution  of  commodity  fc,  in  equation  ([2]).  The  productivities  actually  used  for  production  in  region 
o  will  be  a  subset  of  this  potential  distribution,  where  the  scope  for  trade  endogenously  determines  how  the  potential 
distribution  differs  from  the  distribution  actually  used  to  produce. 

36For  example,  railroads  could  have:  reduced  the  cost  of  technology  transfer  between  regions,  or  the  monitoring 
costs  of  multi-regional  enterprises  (as  in  the  model  of  Ramondo  and  Rodriguez-Clare  (2008),  who  construct  a  model 
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Prediction  6  to  compare  the  reduced-form  effects  of  railroads  on  the  level  and  volatility  of  real  income 
(found  in  Steps  4  and  5)  with  the  effects  predicted  by  the  model  (as  estimated  in  Steps  1  and  2). 


4  Empirical  Step  1:  Railroads  and  Trade  Costs 

In  the  first  step  of  my  empirical  analysis  I  estimate  the  extent  to  which  railroads  reduced  the  cost 
of  trading  within  India.  Because  this  paper  stresses  a  trade-based  mechanism  for  the  impact  of  rail¬ 
roads  on  welfare,  it  is  important  to  establish  that  railroads  actually  reduced  trade  costs.  Further, 
the  relationship  between  railroads  and  trade  costs,  which  I  estimate  in  this  section,  is  an  important 
input  for  the  five  empirical  steps  that  follow. 


4.1  Empirical  Strategy 


Researchers  rarely  observe  trade  costs. But  Prediction  1  suggests  an  instance  under  which  trade 


costs  can  be  inferred:  If  a  homogeneous  commodity  can  only  be  made  in  one  region,  then  the 
difference  in  retail  prices  (of  that  commodity)  between  the  origin  region  and  any  other  consuming 
region  is  equal  to  the  cost  of  trading  between  the  two  regions!^] 

Throughout  Northern  India,  several  homogeneous  types  of  salt  were  consumed,  but  each  of  these 
varieties  could  only  be  made  in  one  unique  location.  Traders  and  consumers  would  speak  about 
‘Kohat  salt’  (which  could  only  be  produced  at  the  salt  mine  in  the  Kohat  region)  as  a  different 
commodity  from  ‘Sambhar  salt’  (which  could  only  be  produced  at  the  Sambhar  Salt  Lake)  J^j  I  have 
collected  data  on  salt  prices  in  Northern  India,  in  which  the  prices  of  eight  regionally-differentiated 
types  of  salt  are  reported  in  124  districts.  Crucially,  because  salt  is  an  essential  commodity,  it  was 
consumed  throughout  India  both  before  and  after  the  construction  of  railroads. 

I  use  this  salt  price  data,  with  the  help  of  Prediction  1,  to  estimate  how  Indian  railroads  reduced 


similar  to  that  here  in  which  there  is  diffusion  of  technology  and  multinational  production);  encouraged  factor 
mobility,  potentially  giving  rise  to  efficiency  gains  if  factors  are  heterogeneous,  or  increasing  the  elasticity  of  labor 
supply  (as  Jayachandran  (2006)  found  in  post-Independence  India);  increased  the  size  of  the  market,  encouraging 
innovation  (as  Sokoloff  (1988)  found  in  the  case  of  US  canals  and  patenting  behavior)  or  allowing  economies  of 
scale  to  be  exploited  (as  found  in  Ades  and  Glaeser  (1999));  or  even  altered  the  political  environment  in  favor  of 
a  commercial  class  that  favored  growth-enhancing  institutions  (as  Acenroglu,  Johnson,  and  Robinson  (2005)  argue 
explained  the  growth  of  European  port  cities  with  access  to  the  new  trade  opportunity  of  Atlantic  trade  post-1500). 

37Even  when  shipping  receipts  are  observed,  as  in  Hummels  (2007),  these  may  fail  to  capture  other  barriers  to 
trade,  such  as  the  time  goods  spend  in  transit  (a  focus  of  Evans  and  Harrigan  (2005)),  or  the  risk  of  damage  or  loss 
in  transit  (a  major  concern  in  colonial  India).  In  lieu  of  direct  measures  of  trade  costs,  a  large  literature,  surveyed 
by  Anderson  and  van  Wincoop  (2004),  uses  a  proxy  variable  strategy  (similar  to  that  I  employ  in  this  section)  to 
estimate  trade  costs. 

38 Anderson  and  van  Wincoop  (2004)  suggest  (on  p.  78)  the  solution  I  pursue  here:  “A  natural  strategy  would 
be  to  identify  the  source  [region]  for  each  product.  We  are  not  aware  of  any  papers  that  have  attempted  to  measure 
trade  barriers  this  way.” 

39The  leading  (nine- volume)  commercial  dictionary  in  colonial  India,  Watt  (1889),  describes  the  market  for  salt 
in  this  manner,  as  do  Aggarwal  (1937)  and  the  numerous  provincial  Salt  Reports  that  were  brought  out  each  year. 
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trade  costs.  To  do  this  I  estimate  equation  ([7])  of  Prediction  1  as  follows 


40 


W4  = 


+  Pod  +  44  +  ^  InTCCRdjodt  +  £°odt  ' 

=^P°ot  =lnT°dt 


(11) 


In  this  equation,  p°dt  is  the  price  of  type-o  salt  (that  is,  salt  that  can  only  be  made  in  region  6)  in 
destination  district  d  in  year  t.  I  estimate  this  equation  with  an  origin-year  fixed  effectp]  (/3°t)  to 
control  for  the  price  of  type-o  salt  in  its  region  of  origin  o  (ie  p°ot)  because  I  do  not  observe  salt 
prices  exactly  at  the  point  where  they  leave  the  source.  (My  price  data  is  at  the  district  level  and 
was  recorded  as  the  average  price  of  goods  over  10-15  retail  markets  in  a  district.) 

The  remainder  of  equation  ([TTj)  describes  how  I  model  the  relationship  between  trade  costs  T°dt, 
which  are  unobservable,  and  the  railroad  network  (denoted  by  R<),  which  is  observable J^j  I  use  two 
different  proxy  variables,  denoted  by  TC(Rt)odt  and  explained  in  detail  below,  that  relate  trade 
costs  to  the  railroad  network.  This  specification  includes  an  origin-destination  fixed  effect  (/3°d) 
which  controls  for  all  of  the  time-invariant  determinants  of  the  cost  of  trading  salt  between  districts 
o  and  d  (such  as  the  distance  from  o  to  d,  or  caste-based  or  ethno-linguistic  differences  between  o 
and  d  that  may  hinder  trade)  p*|  The  specification  also  includes  a  separate  trend  term  (4>°dt)  for 
each  origin-destination  pair;  these  trend  terms  control  for  any  trade  costs  between  o  and  d  that 
vary  over  time  in  a  (log)  constant  way.  Finally,  £°odt  is  an  error  term  that  captures  any  remaining 
unobserved  determinants  of  trade  costs  (or  measurement  error  in  lnp^t) 

I  use  two  different  measures  for  TC(Ht)0dt,  the  proxy  for  the  unobservable  trade  costs  between 
the  origin  o  and  destination  d  districts  in  any  year  t: 
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1.  Bilateral  railroad  dummy  variable:  I  denote  this  variable  by  RAILodt.  This  dummy  variable 
is  equal  to  one  in  all  years  when  it  is  possible  to  travel  from  district  o  to  district  d  by  railroad 
(and  zero  otherwise)  p*|  This  proxy  variable  has  the  advantage  of  simplicity.  But  the  coefficient 
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3The  model  in  section [3] underpinning  prediction  1  assumes  that  trade  costs  take  an  ad  valorem  (that  is,  per  unit 
value )  form,  which  is  inconsistent  with  the  evidence  in  Hummels  and  Skiba  (2004).  To  test  for  a  non- ad  valorem  trade 


cost  specification  I  have  estimated  equation  11  with  an  additional  interaction  term  between  lnTC'(Rf)  and  the  level 


of  an  excise  tax  charged  on  salt  as  it  left  the  ‘factory’  gate.  This  was  a  very  high  tax  (in  the  range  of  100-300  percent 
of  the  value  of  salt),  that  initially  varied  across  provinces,  but  which  fell  precipitously  in  1874,  1878  and  1883  so  that 
all  provinces  had  the  same  tax  rate  (I  take  the  data  on  excise  rates  from  Aggarwal  (1937)).  However,  the  coefficient 
on  this  interaction  term  is  never  statistically  significant.  This  is  consistent  with  my  assumption  that,  regardless  of 
the  factory  gate  price  of  salt,  trade  costs  took  a  form  that  was  proportional  to  the  price  of  the  commodity  shipped. 

41That  is,  each  salt  origin  o  has  its  own  fixed  effect  in  each  year  t.  I  use  this  notation  when  referring  to  fixed 
effects  throughout  this  paper. 
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An  alternative  empirical  strategy  would  be  to  estimate  trade  costs  directly  from  equation  (11),  by  simply 


equating  trade  costs  to  observed  price  differences.  However,  this  method  faces  two  drawbacks  when  compared  to 
the  method  I  follow  here.  First,  it  would  only  uncover  trade  cost  estimates  for  the  o-d-t  observations  for  which  salt 
prices  are  observed  separately  for  each  region  of  origin  (that  is,  no  estimates  would  be  available  in  Southern  India). 
Second,  it  would  be  vulnerable  to  the  concern  that  p°t  is  not  measured  exactly  at  the  point  where  type-o  salt  leaves 
the  source  in  region  o. 

43Rauch  (1999)  and  Anderson  and  van  Wincoop  (2004)  document  a  series  of  findings  that  are  consistent  with 
large  communication-based  barriers  to  trade  in  contemporary  international  trade  data. 

44In  this  specification  and  all  others  in  this  paper  I  allow  this  error  term  to  be  heteroskedastic  and  serially 
correlated  within  districts  (or  trade  blocks,  in  section  [5])  in  an  unspecified  manner. 

45Andrabi  and  Kuehlwein  (2005)  and  Keller  and  Shiue  (2007b)  use  this  dummy  variable  approach  when  studying 
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on  this  variable  is  likely  to  be  biased  downwards  because  the  spread  of  the  railroad  network 
will  potentially  reduce  trade  costs  between  o  and  d  in  years  before  they  are  fully  connected  by 
a  railroad  line.  Furthermore,  this  variable  ignores  any  heterogeneity  in  the  effect  of  railroads 
on  trade  costs  between  two  districts,  such  as  that  due  to  the  distance  between  them,  the 
directness  of  their  railroad  connection,  or  the  local  non-railroad  transportation  system. 

2.  Lowest-cost  route  distance:  1  denote  this  variable  by  LCR( Rf,  ot)0dt ■  This  measure  models  the 
cost  of  trading  goods  between  any  two  locations  under  the  assumption  that  agents  take  the 
lowest-cost  route — using  any  mode  of  transportation — available  to  tliemp’]  Two  inputs  are 
needed  to  calculate  the  lowest-cost  route  between  districts  o  and  d  in  year  t.  The  first  input 
is  the  network  of  available  transportation  routes  open  in  year  t,  which  1  denote  by  Rf.  A 
network  is  a  collection  of  nodes  and  arcs.  In  my  application,  nodes  are  finely-spaced  points  in 
space,  and  arcs  are  available  means  of  transportation  between  the  nodes  (hence  an  arc  could 
be  a  rail,  river,  road  or  coast  connection).  In  modeling  this  network  I  allow  agents  to  travel 
on  navigable  rivers,  the  coastline,  the  road  network  (which  I  take  to  be  continuous  over  space 
and  hence  connecting  any  two  nodes  along  the  straight  line  between  them),  and  the  railroad 
network  open  in  year  t.  The  second  input  is  the  relative  cost  of  traveling  along  each  arc, 
which  depends  on  which  mode  of  transportation  the  arc  represents.  I  model  these  costs  as 
being  proportional  to  distance p]  where  the  proportionality,  the  relative  per  unit  distance  cost , 
of  using  each  mode  is  denoted  by  the  vector  of  parameters  ot  =  (arad,  aroad,  arwer,  acoast).  I 
normalize  arad  =  1  so  the  other  three  elements  of  ot  are  costs  relative  to  the  cost  of  using 
railroads.  Because  of  this  normalization,  LCR(R.t ,  ot)odt  is  measured  in  units  of  railroad- 
equivalent  kilometers;  in  this  sense,  a  finding  that  all  of  the  non-rail  elements  of  ot  are  greater 
than  one  implies  that  railroads  effectively  shrunk  distance,  as  measured  in  railroad-equivalent 
units.  The  parameter  ot  is  unknown,  so  I  treat  it  as  a  vector  of  parameters  to  be  estimated p] 
Conditional  on  a  value  of  ot ,  it  is  possible  to  calculate  LCR(R,t  ■  cx)odt — a  calculation  that  is 
made  computationally  feasible  by  Dijkstra’s  shortest-path  algorithm  (Ahuja,  Magnanti,  and 
Orlin  1993).  But  since  ot  is  unknown,  I  estimate  it  using  non-linear  least  squares  (NLS).  That 

the  effect  of  railroads  on  price  differences  in  19th  Century  India  and  Europe,  respectively. 

46To  the  best  of  my  knowledge,  this  is  a  new  method  for  measuring  trade  costs  over  multiple  modes  of  transportation 
over  a  network,  where  users  are  free  to  choose  their  route  over  the  network.  Houde  (2008)  is  related,  as  he  uses  Dijk¬ 
stra’s  algorithm  to  find  the  likely  commuting  paths  of  automobile  drivers  over  a  road  network  (in  order  to  define  retail 
gasoline  markets).  But  unlike  my  procedure,  he  treats  the  parameters  that  govern  these  users’  path  choices  as  known. 

47This  rules  out  any  fixed  costs  of  switching  modes  of  transportation  (such  as  handling  charges),  or  other 
economies  of  scale  in  the  transportation  sector  either  internal  to  trading  firms  or  external  to  them  (such  congestion 
effects) .  It  is  difficult  to  know  whether  these  features  were  applicable  to  non-rail  transportation  in  colonial  India,  but 
the  simple  freight  rates  charged  by  the  railroads  did  not  feature  either  a  fixed  handling  charge  or  a  bulk  discount  for 
large  shipments,  the  trading  sector  was  characterized  by  a  large  mass  of  small-scale  traders  (Bayly  1983,  Yang  1999), 
and  congestion  effects  on  the  railroads  were  rarely  a  deterrent  to  trade  (Sanyal  1930). 

48 As  discussed  in  section  [2J  relative  freight  rates  for  each  mode  of  transport  are  available  from  a  number  of 
historical  sources.  However,  as  with  overall  trade  costs,  the  cost  of  using  railroads  relative  to  another  mode  of 
transport  may  include  elements  (such  as  increased  certainty  or  time  savings)  that  are  not  included  in  observed 
freight  rates.  Consistent  with  this  idea,  I  find  in  table  2  that  my  estimates  of  a  are  higher  (for  road,  river  and 
coastal  travel  relative  to  rail  travel),  than  the  relative  freight  rates  observed  in  these  historical  sources. 
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is,  I  search  over  all  values  of  ct,  recomputing  the  lowest-cost  routes  at  each  step,  to  find  the 
value  that  minimizes  the  sum  of  squared  residuals  in  equation  (JTTj) . 

4.2  Data 

I  use  data  on  retail  prices  of  8  types  of  salt,  observed  annually  from  1861-1930  in  124  districts 
of  Northern  India  (the  region  in  which  salt  prices  were  reported  broken  down  by  region  of  origin). 
Further  details  on  the  data  I  use  in  this  and  other  sections  of  the  paper  are  provided  in  Appendix  [A] 

4.3  Results 

Column  1  of  Table  2  presents  results  from  estimating  equation  (|TIj)  by  OLS  using  the  bilateral 
railroad  dummy  ( RAIL0dt )  as  the  proxy  for  trade  costs.  The  coefficient  on  this  proxy  variable  is 
negative  and  statistically  significant,  indicating  that  when  two  regions  are  connected  by  a  railroad 
line  the  cost  of  trading  between  them  falls  by  approximately  10  percent.  However,  as  argued  above, 
this  measure  is  likely  to  be  biased  toward  zero  and  to  ignore  significant  heterogeneity  in  the  effect 
of  railroads  on  trade  costs. 

To  address  these  shortcomings  of  the  bilateral  railroad  dummy  variable,  columns  2-5  present  es¬ 
timates  of  equation  (JTTj)  using  my  alternative  proxy  variable  for  trade  costs,  the  lowest-cost  route 
(LC'.R(Rf,  cx.)odt)-  In  column  2  I  estimate  the  effect  of  the  lowest-cost  route  distance  on  trade  costs 
when  the  relative  costs  of  each  mode  (a)  are  set  to  observed  historical  relative  freight  rates  (in 
1900).  I  use  the  relative  per  unit  distance  freight  rates  described  in  section  [2]  (at  their  midpoints): 
ar°ad  =  4.5,  arwer  =  3.0,  and  otcoast  =  2.25  (all  relative  to  the  freight  rate  of  railroad  transport, 
normalized  to  1).  Column  2  demonstrates  that  the  elasticity  of  trade  costs  with  respect  to  the 
lowest-cost  route  distance,  calculated  at  observed  freight  rates,  is  0.135. 

However,  as  argued  above,  it  is  possible  that  these  observed  relative  freight  rates  do  not  capture 
the  full  benefits  of  railroad  transport  relative  to  alternative  modes  of  transportation.  For  this  rea¬ 
son  the  NLS  specifications  in  columns  3-5  estimate  the  relative  freight  rates  (ie  the  parameters  a.) 
that  minimize  the  sum  of  squared  residuals  in  equation  ([TTj).  In  column  3  I  estimate  equation  ([II]) 
without  district-source  specific  trends  included  and  find  an  elasticity  of  salt  prices  with  respect  to 
the  lowest-cost  route  distance  0.255. 

A  potential  concern  with  the  specification  in  column  3  is  that  the  lowest-cost  route  distance 
may  be  trending  over  time  because  an  approaching  railroad  line  will  steadily  reduce  trade  costs. 
Unobserved  determinants  of  trade  costs  (such  as  a  steadily  improving  institutional  environment 
conducive  to  interregional  trade)  may  also  be  trending  over  time,  and  there  is  a  risk  that  these  un¬ 
observed  determinants  may  be  attributed  to  the  railroad  network.  I  therefore  allow  each  district-salt 
source  pair  to  have  its  own  (log)  linear  trend  term  in  column  4.  This  reduces  the  coefficient  on  the 
lowest-cost  route  measure  by  a  small  amount,  but  this  coefficient  is  still  economically  and  statisti¬ 
cally  significant.  Column  4  is  my  preferred  specification.  Even  when  controlling  for  all  unobserved, 
time-constant  and  trending  determinants  of  trade  costs  between  all  salt  sources  and  destinations, 
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reductions  in  trade  costs  along  lowest-cost  routes  (estimated  from  time  variation  in  these  routes 
alone)  have  a  large  effect  on  salt  prices. 

The  non-linear  specification  in  column  4  also  estimates  the  relative  trade  costs  by  mode  that  best 
explain  observed  salt  price  differentials.  Each  of  the  three  alternative  modes  of  transport  is  larger 
than  one,  implying  that  these  alternative  modes  are  more  expensive  (per  unit  distance)  than  rail 
travel.  Further,  each  of  these  non-rail  modes  has  higher  estimated  costs,  relative  to  railroads,  than 
historically  observed  freight  rates.  This  suggests  that  the  advantages  of  railroads  to  encouraging 
trade  were  significant,  but  not  entirely  reflected  in  observed  freight  rates. 

To  summarize  the  results  in  column  4:  the  coefficient  on  the  lowest-cost  route  distance  (5)  is 
positive,  which  implies  that  trade  costs  increase  with  effective  distance  (in  railroad-equivalent  kilo¬ 
meters);  and  the  estimated  mode-specific  per-unit  distance  costs  («)  are  all  much  greater  than  one, 
implying  that  railroads  were  instrumental  in  reducing  effective  distance  when  compared  to  alterna¬ 
tive  modes  of  transportation  (especially  when  compared  to  roads,  which  I  find  were  almost  eight 
times  more  costly  to  use  per  unit  distance  than  railroads). 

Finally,  in  column  5  I  include  both  the  bilateral  railroad  dummy  variable,  and  the  lowest-cost 
route  variable.  The  bilateral  railroad  dummy  is  no  longer  statistically  significantly  different  from 
zero,  and  its  point  estimate  is  much  smaller  than  in  column  1.  By  contrast,  the  lowest-cost  route 
trade  costs  variable  is  still  large  and  statistically  significant,  and  its  magnitude  is  similar  to  that 
in  column  4.  This  suggests  that  the  lowest-cost  route  measure  is  explaining  genuine  features  of 
the  railroad  network  as  it  impacted  on  salt  prices.  I  use  the  estimates  in  column  4,  my  preferred 
specification,  in  the  next  stages  of  my  empirical  strategy. 

5  Empirical  Step  2:  Railroads  and  Trade  Flows 

The  first  step  of  my  empirical  strategy  demonstrated  that  railroads  reduced  trade  costs.  I  now 
estimate  the  extent  to  which  the  reduction  in  trade  costs  brought  about  by  India’s  railroad  network 
(estimated  in  Step  1)  affected  trade  flows  within  India,  and  trade  flows  between  India  and  its  inter¬ 
national  trade  partners.  This  step  is  important  for  two  reasons.  First,  an  expansion  of  trade  volumes 
as  a  result  of  the  railroad  network  is  a  necessary  condition  for  the  mechanism  linking  railroads  to 
welfare  gains  in  the  model.  Second,  as  I  show  below,  estimating  the  model’s  gravity  equation  allows 
all  of  the  model’s  parameters  to  be  inferred.  Equipped  with  these  parameter  estimates  I  am  better 
able  to  test  Predictions  3-6  in  the  sections  that  follow. 

5.1  Empirical  Strategy 

Prediction  2  of  the  model  suggests  a  particular  relationship  between  bilateral  trade  flows  and  bi¬ 
lateral  trade  costs — a  gravity  equation  describing  trade  between  any  two  regions.  Substituting  the 
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empirical  specification  for  7  ()/;  introduced  in  equation  (11)  into  equation  (|S|)  yields 


In  Xkoit  =  /&  +  fy  +  In  -4‘  -  8k  In  rot  -  6kS  In  TC( R,)„s,  +  8k  lnp‘  +  In  Xk,  +  ekoit.  (12) 

Here,  Xkdt  refers  to  the  value  of  exports  of  commodity  k  from  region  o  to  region  d  in  year  t  (and 
the  other  variables  were  defined  in  section  03 

I  estimate  two  versions  of  this  bilateral  exports  equation,  each  with  a  different  goal  in  mind.  The 
first  version  investigates  whether  the  construction  of  India’s  railroad  network  increased  trade  in 
India.  To  do  this  I  estimate  the  equation 

In  Xkoit  =  fa  +  Pkt  +  fa  +  fy  +  p  In  TC(Rt)adt  +  4e-  (13) 


In  this  specification,  the  term  is  an  origin-year-commodity  hxed  effect  and  fidt  is  a  destination- 
year-connnodity  hxed  effect  (the  inclusion  of  these  two  fixed-effects  are  suggested  by  the  model 
in  equation  (12));  Bkd  is  an  origin-destination-commodity  hxed  effect  and  the  term  fikodt  allows  for 
each  origin-destination-commodity  to  have  its  own  trend  term  (these  two  terms  were  motivated  in 
section  |d]  by  the  concern  that  some  costs  of  trading  may  be  unobservable).  The  coefficient  p  on 
the  trade  costs  proxy  variable  lnTC'(Rt)0^  is  therefore  estimated  purely  from  time  variation  in  the 
railroad  network  that  affects  an  exporter  differently  across  its  trading  destinations.  Prediction  2  is 
that  the  coefficient  p  will  be  negative — that,  conditional  on  importer  and  exporter  hxed  effects  (for 
each  commodity),  the  lower  trade  costs  brought  about  by  railroads  increase  trade.49 

In  following  sections  of  this  paper  I  assume  that  the  trade  cost  parameters  for  salt  (which  I  esti¬ 
mated  in  Step  1)  apply  to  other  commodities  as  well.  One  potential  concern  with  this  assumption 
is  that  the  parameter  S,  which  relates  the  lowest-cost  route  variable  (LCR(Rt,  a)odt)  to  trade  costs, 
may  vary  across  commodities.  To  explore  this  possibility  while  estimating  equation  (13)  (which 
is  pooled  across  commodities),  I  use  LCR(Rt,  cx.)odt  as  a  proxy  for  the  trade  cost  term  TC(Rt)odt 
and  also  include  interaction  terms  between  LCR{Rt ,  ct)odt  and  commodity-specific  characteristics p9] 
The  characteristics  I  include  are  weight  per  unit  value  and  the  ‘freight  class’  in  which  each  com¬ 
modity  was  placed  by  railroad  companies — both  measured  at  the  start  of  the  period.  If  the  effect  of 
LCR( Rj,  a.)odt  on  trade  flows  does  not  vary  across  commodities  according  to  these  characteristics 
(which  capture  the  most  obvious  reasons  why  per  unit  distance  trade  costs  could  differ  by  commod- 


49The  coefficient  p  is  not  the  general  equilibrium  elasticity  of  trade  flows  with  respect  to  trade  costs.  This  is 
because  equation  (12)  also  contains  the  endogenous  variables,  land  rental  rates  rot,  goods  prices  pdt,  and  aggregate 
expenditure  Xdt.  I  control  for  these  endogenous  variables  (by  the  use  of  fixed  effects)  in  my  estimating  equation 
(13),  so  they  do  not  present  a  risk  of  bias  to  the  estimation  of  p ;  however,  comparative  statics  exercises  (such  as 
the  general  equilibrium  elasticity  of  trade  costs  with  respect  to  trade  flows)  must  allow  these  endogenous  variables 
to  adjust.  Following  Anderson  and  van  Wincoop  (2003),  the  coefficient  p  is  best  thought  of  as  a  partial  equilibrium 
elasticity,  which  holds  constant  all  of  these  necessary  adjustments  in  other  goods  markets  and  the  factor  market. 
However,  as  these  authors  note,  the  sign  of  the  true  (general  equilibrium)  elasticity  will  be  of  the  same  sign  as  the 
partial  equilibrium  elasticity  estimated  here. 

50In  this  specification  In  LCR(a,  R*  )odt  is  a  generated  regressor  because  the  parameter  a  was  estimated  in  Step 
1.  I  therefore  correct  the  standard  errors  in  this  regression  to  account  for  the  presence  of  a  generated  regressor  using 
a  two-step  bootstrap  procedure. 
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ity)  then  this  would  be  consistent  with  the  assumption  that  trade  costs  for  salt  are  representative  of 
trade  costs  for  other  commodities.  A  second  concern  with  this  assumption  is  that  the  relative  per 
unit  distance  cost  of  using  each  mode  of  transportation  (a)  may  also  vary  across  commodities,  so 
that  my  parameter  estimates  of  a,  also  obtained  from  salt,  do  not  carry  over  to  other  commodities. 
1  discuss  evidence  that  is  inconsistent  with  this  second  concern  below. 


The  second  version  of  equation  (12)  that  1  estimate  takes  the  model  more  seriously  in  order  to 


estimate  unknown  parameters  of  the  model.  Two  sets  of  parameters  remain  unknown:  the  technol¬ 
ogy  parameter  dp.  (for  each  commodity),  and  the  productivity  parameters  Akt  (for  each  district,  year 
and  commodity).  I  first  estimate  the  parameters  9k  by  substituting  the  lowest-cost  route  distance 
proxy  for  trade  costs  estimated  in  Step  1  (ie  6  In  LCR{ol,  Ht)odt,  where  5  and  6t  were  estimated  in 


Step  1)  into  equation  (12)  to  obtain 


In  -A,  =  A  +  A  +  A  +  A*  ~  hS In  LCR(R,,a)odt  +  ekM. 


(14) 


In  this  specification,  the  coefficient  on  the  lowest-cost  route  distance  variable  (5  In  LCR(a,Ht)0dt) 
is  exactly  6k-  Intuitively,  the  scope  for  comparative  advantage  (the  inverse  of  6k)  governs  how  much 
a  reduction  in  trade  costs  translates  into  an  expansion  of  trade.  I  therefore  estimate  this  equation 
separately  for  each  of  the  85  commodities  in  my  trade  flows  dataset,  in  order  to  estimate  85  values 
of  6k  (one  for  each  commodity  k). 

Armed  with  the  parameter  estimates  6k  it  is  then  possible  to  estimate  the  other  unknown  variable 
in  the  model,  the  unobserved  productivity  term,  Akot  (though  this  is  only  possible  for  agricultural 
commodities).  I  relate  Akt  to  observables  by  assuming  that  Akt  is  a  function  of  a  crop-specific  ram- 
fall  shock,  denoted  by  RAINkt.  As  argued  in  section  |2j  rainfall  was  an  important  determinant  of 
agricultural  productivity  in  India  because  irrigation  was  uncommon.  However,  a  given  distribution 
of  annual  rainfall  would  affect  each  crop  differently  because  each  crop  has  its  own  annual  timetable 
for  sowing,  growing  and  harvesting,  and  these  timetables  differ  from  district  to  district.  To  shed 
light  on  these  crop-  and  district-specific  agricultural  timetables,  I  draw  on  the  1967  publication, 
the  Indian  Crop  Calendar  (Directorate  of  Economics  and  Statistics  1967),  which  lists  sowing,  grow¬ 
ing  and  harvesting  windows  for  each  crop  and  district  in  my  sample p]  To  construct  the  variable 
RAINft,  I  use  daily  rainfall  data  to  calculate  the  amount  of  rainfall  in  year  t  that  fell  between  the 
first  sowing  date  and  the  last  harvest  date  listed  for  crop  k  in  district  o. 

It  is  then  possible  to  estimate  the  relationship  between  rainfall  and  productivity  by  noting  that 

slThis  publication  describes  the  technology  of  agricultural  practice  (related  to  scheduling  of  activities)  in  each 
district.  This  particular  aspect  of  agricultural  technology  is  unlikely  to  have  changed  between  my  sample  period  and 
1967  because  the  optimal  sowing  date  for  a  crop  depends  on  the  amount  of  water  in  the  soil  on  that  date,  which  is 
governed  by  the  type  of  soil  and  the  local  climate  (Mukerji  1915),  both  of  which  are  unlikely  to  change  over  this  time 
frame.  Nevertheless,  to  test  for  this  stability  of  agricultural  scheduling  I  use  the  earliest  Crop  Calendar  that  I  was 
able  to  access,  which  is  from  1908.  This  volume  presents  data  at  larger  geographic  areas  than  the  district.  However, 
when  I  calculate  crop-specific  sowing  and  growing  rainfall  amounts  for  these  larger  geographic  areas  the  correlation 
between  these  and  the  (area-weighted  average)  1967  district-level  amounts  is  0.78.  This  is  a  strong  correlation, 
especially  given  that  the  district-level  data  is  aggregated,  indicating  that  the  timing  of  agricultural  activities  has 
not  changed  dramatically  (from  1908  to  1967  at  least). 
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the  exporter-commodity-year  fixed  effect  (fi^fi)  in  equation  (14)  can  be  interpreted  in  the  model  as 


fiot  =  In  Afi—9k  In  rot ,  by  comparing  equations  ( 12 )  and  ( 14 )  rfl  I  model  the  relationship  between  pro¬ 


ductivity  ( Akot )  and  rainfall  (RAIN^fi)  in  the  simplest  possible  semi-log  manner:  In  A°t  =  nRAIN^f^ 
Guided  by  this  relationship,  I  estimate  the  parameter  hi  in  the  following  estimating  equation: 


fiot  +  hr  r ot  —  Po  +  Pt  +  Pot  +  uRAIN °t  +  e; 


odt’ 


(15) 


In  this  equation,  fifi  is  the  estimated  exporter-commodity-year  fixed  effect,  and  9k  is  the  esti¬ 


mated  technology  parameter,  both  of  which  are  estimated  in  equation  (14)  above.  The  terms  fig, 
fit,  and  fid  represent  exporter-commodity,  commodity-year  and  exporter-year  fixed  effects,  respec¬ 
tively.  I  include  these  terms  to  control  for  unobserved  determinants  of  exporting  success  that  do 
not  vary  across  regions,  commodities  and  time.  For  example,  the  exporter-commodity  fixed  effect 
(fi^)  controls  for  all  time-invariant  factors  that  make  region  o  successful  at  exporting  commodity  k 
(such  as  the  region’s  altitude).  As  a  result,  the  coefficient  k  is  estimated  purely  from  the  variation 


in  rainfall  over  space,  commodities  and  time.  The  final  term  in  equation  (15)  is  an  error  term  (ekodt) 
that  includes  any  determinants  of  exporting  success,  other  than  rainfall,  that  vary  across  regions, 
commodities  and  time. 


In  summary,  the  method  described  in  this  second  version  of  estimating  equation  (12)  estimates 


the  parameter  6k  for  each  of  the  85  goods  k  for  which  I  have  trade  data.  This  method  also  estimates 
the  relationship  between  the  unobserved  productivity  terms  Agt  and  crop-specific  rainfall  RAI Nfi 
(governed  by  the  parameter  k). 


5.2  Data 


I  estimate  equations  (13),  (14)  and  (15)  using  over  six  million  observations  on  Indian  trade  flows 
that  I  have  collected.  The  trade  flow  data  relate  to  both  internal  trade  data  (between  45  trade 
blocks  of  India)  and  external  trade  data  (between  each  of  these  45  internal  trade  blocks  and  23 
foreign  countries),  over  rail,  river  and  coastal  transport  routes,  for  85  commodities,  annually  from 
1880  to  1920.  When  estimating  equation  (Jl5|),  I  use  the  crop-specific  rainfall  measure  (RAIN^fi) 
described  briefly  above  (and  in  more  detail  in  Appendix  [A])  and,  lacking  reliable  data  on  land  rental 
rates,  I  use  nominal  agricultural  GDP  per  acre  as  a  measure  of  rot  (since  in  my  model  these  two 
measures  are  equivalent). 


52Intuitively,  higher  productivity  in  commodity  k  (Afi)  will  increase  region  o’s  propensity  to  export  to  any 
location,  leading  to  a  higher  exporter  fixed  effect  /3*t;  however,  higher  productivity  will  also  raise  the  land  rental 
rate  (rat),  decreasing  the  propensity  to  export,  and  a  lower  exporter  fixed  effect. 

53It  is  also  possible  that  the  marginal  productivity  of  rainfall  is  diminishing,  becoming  detrimental  to  production 
at  some  point.  With  such  effects  in  mind  I  have  also  estimated  a  specification  where  I  include  both  RAINfi  and 
(RAINfi)2.  However,  in  this  alternative  specification,  the  coefficient  on  the  squared  amount  of  rainfall  is  actually 
positive  (implying  increasing  marginal  productivity  of  rainfall),  but  never  statistically  significant. 
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5.3  Results 


Table  3  presents  results  from  this  section.  Column  1  contains  estimates  of  equation  (13)  using  OLS, 
where  the  trade  costs  proxy  used  is  the  bilateral  railroad  dummy  variable]^]  The  coefficient  on  the 
railroad  dummy  variable  is  positive  and  statistically  significant — even  though,  as  argued  in  section 
[4j  the  coefficient  on  this  trade  costs  proxy  is  likely  to  be  biased  downwards,  very  precisely  estimated. 
This  suggests  that  railroads  significantly  boosted  trade,  and  provides  support  for  prediction  2. 

In  column  2  of  table  3  I  estimate  equation  ( 13 )  again,  this  time  with  the  lowest-cost  route  variable 
used  to  proxy  for  trade  costs  instead  of  the  railroad  dummy  variable.  The  lowest-cost  route  distance 
proxy  depends  on  the  unknown  parameters  ot,  the  per  unit  distance  trade  costs  along  each  mode  of 
transportation,  relative  to  rail  transport.  In  order  to  compute  the  lowest-cost  route  distance  in  esti¬ 
mating  equation  (13),  I  use  the  estimated  value  of  the  parameters  a  presented  in  column  4  of  table 
2.  This  requires  the  maintained  assumption  that  the  relative  cost  of  transporting  any  commodity 
by  rail  (relative  to  other  modes)  is  the  same  as  that  for  salt;  that  is,  the  per-unit  distance  trade  cost 
may  differ  across  commodities,  but  in  a  way  such  that  the  relative  cost  of  using  non-rail  transport 
relative  to  rail  transport  is  the  same  for  all  commodities  P’1  The  results  in  column  2  provide  further 
support  for  prediction  2,  as  the  lowest-cost  route  measure  is  estimated  to  reduce  bilateral  trade 
(conditional  on  the  fixed  effects  used)  with  a  statistically  significant  elasticity  of  (minus)  1.3.  This 
result  is  in  line  with  a  large  body  of  work  on  estimating  gravity  equations P*] 

In  column  3  of  table  3  I  investigate  the  possibility  that  the  elasticity  of  trade  flows  with  respect 
to  lowest-cost  route  distance  routes  varies  by  commodity.  I  do  this  by  including  interaction  terms 
between  the  lowest-cost  route  distance  variable  and  two  commodity-specific  characteristics:  weight 
per  unit  value  (as  observed  in  1880  prices,  averaged  over  all  of  India),  and  ‘freight  class’  (an  indica¬ 
tor  used  by  railroad  companies  in  1880  to  distinguish  between  ‘high-value’  and  ‘low- value’  goods). 
The  results  in  column  3  are  not  supportive  of  the  notion  that  commodities  had  trade  flow  elasticities 
with  respect  to  trade  costs  that  depend  on  weight,  or  freight  class;  that  is,  neither  of  these  interac¬ 
tion  terms  is  significantly  different  from  zero  (nor  are  they  jointly  significantly  different  from  zero). 
This  lends  support  to  the  maintained  assumption  throughout  this  paper  that  trade  cost  parameters 


54Because  my  trade  flow  data  is  available  for  trade  blocks,  which  are  larger  than  districts,  I  define  the  ‘bilateral 
railroad  dummy’  variable  here  ( RAILocit )  as  the  share  of  district  pairs  between  trade  block  o  and  trade  block  d  that 
are  connected  by  the  railroad  network. 

55  One  piece  of  evidence  consistent  with  this  assumption  comes  from  data  on  district-to-district  trade  flows  (for 
each  of  15  goods,  one  of  which  is  salt)  in  Bengal  from  1877  to  1881,  along  each  of  the  three  modes  of  transport 
available  in  that  area  (rail,  river  and  road).  I  regress  log  bilateral  exports  by  road  relative  to  exports  by  rail  on 
exporter-importer-year  fixed  effects,  and  a  fixed  effect  for  each  commodity.  The  F-test  that  these  commodity-level 
fixed  effects  are  all  equal  to  each  other  has  a  p- value  of  0.34,  so  it  cannot  be  rejected  at  the  5  percent  level.  A  similar 
test  for  a  regression  with  exports  by  river  relative  to  exports  by  rail  has  a  p- value  of  0.28.  These  results  are  consistent 
with  the  view  that,  within  an  exporter-importer-year  cell,  goods  do  not  have  systematically  different  trade  costs. 

56Head  and  Disdier  (2008)  conduct  a  meta-study  of  103  papers  estimating  the  coefficient  on  bilateral  distance  in 
a  gravity  equation.  They  find  a  mean  estimate  of  this  coefficient  of  0.9  (with  90  percent  of  estimates  lying  between 
0.28  and  1.55).  That  my  result  is  higher  than  the  mean  estimate  in  these  103  papers  is  unsurprising  because  they 
were  estimated  primarily  on  post-1960  data.  Technological  improvements  in  transportation  (ocean  shipping  and  air 
freight)  and  telecommunications  are  likely  to  have  reduced  the  trade-impeding  effects  of  distance,  when  compared 
to  railroad  transportation  and  communication  in  1880-1920  India. 
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for  the  shipment  of  salt  can  be  applied  to  other  commodities,  without  doing  injustice  to  the  data. 


Finally,  I  estimate  equation  (14)  in  a  manner  that  allows  me  to  estimate  all  of  the  remaining  un¬ 


known  parameters  of  the  model.  1  begin  by  estimating  equation  (14),  one  commodity  at  a  time  (for 
each  of  the  85  commodities  in  the  trade  flows  data),  in  order  to  obtain  estimates  of  the  comparative 
advantage  parameters  6 k  for  each  commodity.  The  mean  across  all  of  these  85  commodities  is  5.2 
(with  a  standard  deviation  across  commodities  of  2.1).  This  is  slightly  lower  than  the  preferred  es¬ 
timate  of  8.28  in  Eaton  and  Kortum  (2002)  obtained  from  intra-OECD  trade  flows  in  1995,  treating 
all  of  the  manufacturing  sector  as  one  commodity]^]  However,  the  mean  value  of  9k  across  only  the 
17  principal  agricultural  commodities  for  which  1  have  output  and  price  data  (and  therefore  use 
heavily  below)  is  3.8  (with  a  standard  deviation  of  1.2).  This  suggests  a  greater  scope  for  compara¬ 
tive  advantage  based  gains  from  trade  among  agricultural  goods  than  among  manufacturing  goods, 
at  least  in  colonial  India. 


1  then  estimate  equation  ( 15 )  to  obtain  an  estimate  of  k,  the  parameter  that  relates  crop-specific 
rainfall  to  (potential)  productivity  (A°t  in  the  model).  1  estimate  a  value  of  0.441  for  k  (with  a 
standard  error  of  0.082),  implying  that  a  one  standard  deviation  (0.605  m)  increase  in  crop-specific 
rainfall  causes  a  27  percent  increase  in  agricultural  productivity  (as  defined  by  A°t  in  the  model). 
This  suggests  that  rainfall  has  a  positive  and  statistically  significant  effect  on  productivity,  as  ex¬ 
pected  given  the  importance  of  water  in  crop  production  and  the  paucity  of  irrigated  agriculture  in 
colonial  India  (as  discussed  in  section  [2|. 

In  summary,  the  results  from  this  section  demonstrate  that  railroads  significantly  expanded  trade 
in  India.  This  finding  is  in  line  with  Prediction  2  and  suggests  that  the  expansion  of  trade  brought 
about  by  the  railroad  network  could  have  given  rise  to  welfare  gains  due  to  increasingly  captured 
gains  from  trade.  A  second  purpose  of  this  section  was  to  use  the  empirical  relationship  between 
trade  costs  (estimated  in  Step  1)  and  trade  flows  to  estimate  the  remaining  unknown  model  param¬ 
eters,  9k  and  Akot.  These  parameters  are  important  inputs  for  Steps  3  to  6  below. 


6  Empirical  Step  3:  Railroads  and  Price  Responsiveness 

The  results  in  Steps  1  and  2  indicate  that  India’s  railroad  network  significantly  reduced  trade  costs 
and  expanded  trade.  1  now  investigate  the  impact  of  this  change  in  the  trading  environment  on 
the  prices  of  tradable  commodities.  In  particular,  following  prediction  3  of  the  model,  1  test  the 
hypothesis  that  railroads  reduced  the  responsiveness  of  local  agricultural  prices  to  local  rainfall  (an 
exogenous  determinant  of  local  productivity).  In  a  small  open  economy  (SOE),  price  responsiveness 
is  zero  since  local  prices  are  equal  to  the  (exogenous)  world  price  level.  However,  as  trade  costs  rise 
and  an  economy  departs  from  the  SOE  limit,  price  responsiveness  in  that  economy  should  rise  (as 
in  prediction  3).  The  extent  of  price  responsiveness  in  a  district  is  therefore  a  novel  and  powerful 

57Using  two  alternative  methods  Eaton  and  Kortum  (2002)  also  obtained  estimates  of  3.60  and  12.86. 
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test  of  its  openness  to  trade,  which  motivates  the  empirical  exercise  in  this  section,58  A  second  goal 


of  this  section  is  to  evaluate  the  performance  of  my  model  at  predicting  the  price  behavior  seen  in 


the  data  in  an  ‘out-of-sample’  sense  (because  the  model  was  estimated  in  Steps  1  and  2  using  no 
agricultural  price  information). 


6.1  Empirical  Strategy 

Prediction  3  has  two  parts:  (1)  when  a  district  is  connected  to  the  railroad  network,  agricultural 
goods  prices  in  that  district  will  be  less  responsive  to  productivity  shocks  in  that  district;  and 
(2),  when  a  railroad  line  connects  two  districts,  agricultural  goods  prices  in  a  district  will  be  more 
responsive  to  productivity  shocks  in  the  other  district. 

1  test  this  prediction  by  estimating  the  following  linear  specification  (which  can  be  interpreted  as 
a  linear  approximation  to  the  model  to  the  model  around  a  symmetric  point): 

In j4  =  0{  +  ti  +  Pd,+XiRAIN*t  +  X2RAILitxRAINi  +  x3(Tird)Y.  RAIN« 

oe  Nd 

+  Wife)  y  RAINit  X  RAILadt  +  e%.  (16) 

oeNd 


Here,  pkdt  represents  the  retail  price  of  agricultural  crop  k  in  district  d  and  year  t.  RAINdt  is  the 
amount  of  crop-specific  rainfall  that  fell  in  district  d  in  year  t  (this  variable  is  described  in  full  in 
section  [5]  where  it  was  first  used).  The  variable  RAILdt  is  a  dummy  variable  equal  to  one  when 
the  railroad  network  enters  the  boundary  of  district  d ,  while  the  variable  RAILdt  is  a  dummy 


variable  equal  to  one  when  it  is  possible  to  travel  from  district  o  to  district  d  using  only  the  railroad 
network.  Finally,  the  variable  RAINkt  represents  the  amount  of  crop-specific  rainfall  in  district 
o,  where  district  o  is  a  neighbor  of  district  d — one  of  the  the  Nd  districts  (o  7^  d)  in  district  d’s 
neighborhood  (taken  to  be  all  districts  that  lie  even  partially  inside  a  250  km  radius  of  district 
d’s  centroid)  j5^]  The  summation  terms  in  equation  (16)  are  divided  by  the  number  of  districts  Nd 
in  the  neighborhood  to  reflect  an  average  effect. 

1  estimate  equation  (16)  using  fixed  effects  for  each  district-year  ( j3dt ),  which  control  for  any 


58To  my  knowledge,  this  is  a  novel  test  for  assessing  a  change  in  market  integration.  However,  two  papers 
are  closely  related.  First,  Shiue  (2002)  examines  how  the  price  correlation  (over  many  years)  between  pairs  of 
markets  in  19th  Century  China  is  related  to  the  weather  correlation  (over  the  same  years)  in  these  pairs,  comparing 
this  correlation  in  inland  locations  to  that  along  rivers  or  the  coast.  Second,  Keller  and  Shiue  (2007a)  estimate 
formally,  in  the  same  setting,  how  the  spatial  dependence  of  weather  shocks  (on  prices)  varies  between  inland  and 
water-accessible  regions.  Neither  of  these  papers  focuses  on  the  responsiveness  of  local  prices  to  local  rainfall,  nor  on 
whether  the  spatial  transmission  of  weather  shocks  is  different  along  some  transportation  links  (such  as  railroads) 
than  along  others  (such  as  roads),  in  the  manner  I  do  here. 

59While  in  principle  the  rainfall  in  any  district  o  could  affect  prices  in  district  d,  my  model  suggests  that  these 
effects  are  likely  to  die  out  quickly  over  distance.  In  a  partial  equilibrium  sense  (that  is,  without  allowing  for  the  land 
rental  rate  r0  to  adjust),  this  can  be  seen  easily  in  equation  Q.  Here,  each  distant  district’s  productivity  term  A° 
affects  local  prices  pd  in  a  manner  proportional  to  iT^d)  6k  1  where  T°d  is  the  trade  cost  (proportional  to  distance)  and 
9k  was  estimated  in  Step  2  as  3.8.  I  therefore  restrict  the  effect  of  non-local  rainfall  on  district  d’s  prices  to  that  in 
a  short  (250  km)  range,  though  my  results  are  insensitive  to  using  smaller  (eg  100  km)  or  larger  (eg  500  km)  ranges. 
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unobservable  variables  affecting  prices  that  are  constant  across  crops  within  a  district  and  year. 
This  means  that  I  identify  price  responsiveness  through  variation  in  how  a  given  amount  of  annual 
rainfall  in  a  district  affects  each  of  that  district’s  crops  differently.  I  also  include  fixed  effects  for 
each  district-crop  (j3^)  to  control  for  unobservables  that  permanently  affect  a  district’s  productivity 
of  a  given  crop  (such  as  the  district’s  soil  type),  and  fixed  effects  for  each  crop-year  (/3ffe)  to  control 
for  country-wide  shocks  to  the  price  of  each  crop. 

To  the  extent  that  rainfall  is  a  significant  determinant  of  productivity  (as  I  found  to  be  the  case 
revealed  in  trade  flows,  in  Step  2),  the  coefficients  Xi  and  X3  will  be  negative.  Prediction  3  (a) 
states  that  the  coefficient  X2  is  positive  (prices  in  district  d  are  less  responsive  to  rainfall  in  district 
d  if  district  d  is  on  the  railroad  network).  And  prediction  3  (b)  states  that  the  coefficient  X4  is 
negative  (lower  transport  costs  should  make  prices  in  district  d  more  responsive  to  rainfall  shocks 
in  neighboring  districts  to  d).  A  positive  coefficient  X2  is  consistent  with  railroads  increasing  the 
extent  of  market  integration  in  India. 


6.2  Data 


I  estimate  equation  (16)  using  annual  data  on  the  retail  price  of  17  agricultural  commodities,  in 


239  districts,  from  1861-1930.  These  prices  were  collected  by  district  officers  who  visited  the  10-15 
largest  retail  markets  in  each  district  once  every  two  weeks.  India-wide  instructions  were  issued  to 
each  province  to  ensure  that  prices  of  each  commodity  were  recorded  in  a  consistent  manner  across 


the  provinces.  The  other  variables  used  to  estimate  equation  (16),  concerning  railroads  and  rainfall, 
were  described  in  sections  [4]  and  [5j  respectively. 


6.3  Results 


Table  4  presents  results  from  OLS  estimates  of  equation  (16).  Column  1  begins  by  regressing  (log) 
agricultural  prices  in  a  district  on  the  district’s  crop-specific  local  rainfall.  The  coefficient  on  local 
rainfall  is  negative  and  statistically  significant,  suggesting  that  rainfall  has  a  positive  impact  on  crop 
output,  and  this  increase  in  supply  transmits  into  local  retail  prices.  This  is  indicative  of  imperfect 
market  integration  in  these  agricultural  commodities  on  average  over  the  time  period  1861-1930  in 
India.  The  coefficient  estimate  implies  a  large  amount  of  price  responsiveness  on  average  over  the 
period:  a  one  standard  deviation  (ie  0.604  m)  increase  in  a  crop’s  crop-specific  rainfall  decreases 
that  crop’s  prices  by  approximately  15  percent. 

Column  2  of  table  4  then  tests  the  first  part  of  prediction  3:  that  a  district’s  prices  will  be  less 
responsive  to  local  rainfall  after  the  district  is  connected  to  the  railroad  network.  In  this  specifi¬ 


cation  the  coefficient  on  local  rainfall  (yi  in  equation  (16))  represents  price  responsiveness  before 


railroads  penetrate  a  district.  The  estimated  coefficient  is  negative,  statistically  significant,  and 
demonstrates  a  great  deal  of  price  responsiveness  in  the  pre-railroad.  Further,  in  line  with  predic¬ 
tion  3,  the  coefficient  X2  on  rainfall  interacted  with  a  dummy  for  railroad  access  ( RAILdt )  is  positive 
and  statistically  significant.  The  sum  of  the  coefficients  xi  and  X2  represents  the  extent  of  price 


responsiveness  after  the  district  is  brought  into  the  railroad  network.  The  estimated  coefficients 
sum  to  -0.014  which  implies  that  prices  are  still  responsive  to  local  rainfall,  but  in  a  dramatically 
reduced  sense  when  compared  to  the  coefficient  of  -0.428  that  measures  price  responsiveness  in  the 
pre-rail  era.  However,  I  cannot  reject  the  null  of  zero  price  responsiveness  in  the  post-rail  era.  These 
findings  suggests  that  the  imperfect  market  integration  from  1861-1930  found  in  column  1  reflects 
an  average  of  two  extreme  regimes  separated  by  the  arrival  of  a  railroad  line  in  a  district:  a  first 
regime  of  imperfect  integration  before  the  railroad  arrives  (where  local  supply  shocks  have  large 
effects  on  local  prices),  and  a  second  regime  of  near-perfect  integration  after  the  railroad  arrives 
(where  local  supply  shocks  have  a  negligible  effect  on  local  prices). 

Column  3  repeats  the  specification  in  column  1,  but  with  the  inclusion  of  average  rainfall  in 
neighboring  districts.  The  effect  of  neighboring  districts’  rainfall  on  local  prices  is  negative  and 
statistically  significant,  which  implies  that,  on  average  over  the  period  from  1861-1930,  neighboring 
districts’  supply  shocks  affected  local  prices,  as  is  consistent  with  some  degree  of  market  integration. 

However,  the  estimates  in  column  4  demonstrate  that,  as  was  the  case  in  column  2,  the  average 
effect  in  column  3  is  masking  the  behavior  of  two  different  regimes.  Column  4  estimates  equation 


(16)  in  its  entirety  by  including  an  interaction  term  between  each  neighboring  district’s  rainfall 
and  a  dummy  variable  for  whether  that  district  is  connected  to  the  ‘local’  district  by  railroad  (ie 
RAILodt).  As  is  consistent  with  the  second  part  of  prediction  3,  the  coefficient  on  this  interaction 
term  is  negative  and  statistically  significant.  Furthermore,  the  coefficient  on  neighboring  districts’ 
rainfall  (which  is  now  the  effect  of  rainfall  in  districts  not  connected  by  railroad  to  the  local  district) 
is  not  significantly  different  from  zero.  Column  4  therefore  suggests  that  local  prices  do  respond 
to  neighboring  districts’  supply  shocks  when  those  neighbors  are  connected  to  the  local  district  by 
railroads;  however,  neighboring  districts’  supply  shocks  are  irrelevant  to  local  prices  when  there  is 
no  railroad  connection. 

To  summarize  the  results  from  this  section,  I  find  that  railroads  played  a  dramatic  role  in  fa¬ 
cilitating  market  integration,  as  revealed  by  price  responsiveness,  among  the  17  agricultural  goods 
in  my  sample.  This  is  consistent  with  both  parts  of  prediction  3  of  the  model,  and  suggests  that 
railroads  significantly  improved  the  trading  environment  in  colonial  India. 


6.4  Model  Evaluation 

The  results  in  table  4  find  significant  support  for  the  price  responsiveness  relationships  in  prediction 
3.  A  more  direct  test  of  the  model’s  predictions  on  price  behavior  is  to  compare  prices  in  the  model 
(for  each  district,  year  and  crop)  to  those  in  the  data.  To  do  this  I  estimate  the  regression: 

ln  Pdt  =  Pd  +  Pt  +Pdt  +  tz  In  9dt  +  e*  >  (17) 


where  pkdt  is  the  observed  set  of  prices  and  fPdt  is  the  predicted  set  of  prices.  If  the  model  is  specified 
correctly  then  the  coefficient  w  should  be  equal  to  one.  I  include  the  fixed  effects  /3d,  f3k  and  j3dt  in 
this  specification  in  order  to  test  the  model  using  the  same  variation  as  in  equation  (16). 
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I  estimate  this  equation  using  the  same  data  as  described  earlier  in  this  section.  In  order  to 
compute  predicted  prices  (p^t),  I  solve  the  model  completely  (in  each  period)  using  the  estimated 
parameters  from  Steps  1  and  2  and  the  observed  exogenous  variables  (railroads,  rainfall  and  land 
area)  in  each  periodj^j 

I  find  an  estimated  coefficient  of  w  =  0.913  (with  a  standard  error  of  0.189),  which  implies  that 
the  model  has  significant  predictive  power.  Since  the  model’s  parameters  are  estimated  using  data 


that  did  not  include  the  agricultural  price  data  that  I  use  to  estimate  equation  (17)  here,  this  con¬ 
stitutes  an  ‘out-of-sample’  test  of  the  model  at  which  the  model  performs  well.  This  increases  my 
confidence  that  the  model  is  capable  of  replicating  features  of  the  data,  and  its  ability  to  inform 
the  mechanisms  behind  my  reduced-form  estimates  (a  use  to  which  I  put  the  model  in  Step  6). 


7  Empirical  Step  4:  Railroads  and  Real  Income  Levels 

Steps  1  to  3  have  established  that  Indian  railroads  significantly  reduced  trade  costs,  expanded  trade, 
and  reduced  price  responsiveness.  These  findings  suggest  that  railroads  dramatically  changed  the 
trading  environment  in  India.  I  now  go  on  to  investigate  the  welfare  consequences  of  railroad  ex¬ 
pansion  in  India  by  estimating  the  effect  of  railroads  on  real  income  levels  (in  this  section)  and  real 
income  volatility  (in  Step  5,  the  next  section). 

7.1  Empirical  Strategy 

Prediction  4  of  my  model  states  that  a  district’s  real  income  will  increase  when  it  is  connected 
to  the  railroad  network,  but  that  its  real  income  will  fall  as  one  of  its  neighbors  is  connected  to 
the  railroad  network  (holding  its  own  access  constant).  These  predictions  motivate  an  estimating 
equation  of  the  form 


hr  (p2);)  —  fio  +  fit  +  7 RAILot  +  tHy;)  RAIL^t  +  £0t-  (18) 

d£  N„ 

In  this  estimating  equation,  represents  real  agricultural  income  per  acre  in  district  o  and  year 
t.  In  my  model,  rot  is  the  nominal  land  rental  rate,  but  I  have  been  unable  to  find  systematic  data 
on  land  rents  in  this  time  period.  However,  in  my  model,  nominal  land  rents  are  equal  to  nominal 
output  per  unit  area,  on  which  data  was  collected  in  the  agricultural  sector  (the  dominant  sector  of 
the  colonial  Indian  economy),  so  I  use  this  to  measure  rot 

60This  is  laid  out  in  more  detail  in  section  [9J  when  I  use  a  similar  procedure. 

61Real  income  per  acre  is  equal  to  welfare  (for  a  representative  agent)  in  my  model,  but  may  not  be  in  my 
empirical  setting  because  output  per  acre  may  diverge  from  output  per  capita  if  the  population  of  each  district  is 
endogenous,  and  related  to  railroad  expansion.  Population  could  be  endogenous  for  two  reasons.  First,  fertility  and 
mortality  may  have  been  endogenous  in  the  setting  of  colonial  India — in  a  Malthusian  limit,  fertility  and  mortality 
would  adjust  to  any  agricultural  productivity  improvements  (due  to  railroads)  and  hold  output  per  capita  constant. 
However,  the  potential  for  endogeneous  fertility  and  mortality  responses  is  likely  to  vary  from  setting  to  setting 
so  while  an  effect  of  railroads  on  output  per  acre  is  transferable  to  alternative  settings,  an  effect  on  output  per 


P]  The  denominator,  P, 


ot,  is  a  consumer 
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price  index 

The  first  regressor  in  equation  (18)  is  RAILot,  a  dummy  variable  that  is  equal  to  one  in  all  years  t 
in  which  some  part  of  district  o  is  on  the  railroad  network.  The  summation  term  captures  the  effect 
of  railroads  in  other,  neighboring  districts  on  the  level  of  real  income  in  the  district  of  observation 
o.63  Finally,  1  estimate  equation  (18)  using  fixed  effects  at  the  district  (/30)  and  year  (f3t)  levels, 
so  that  the  effect  of  railroads  is  identified  entirely  from  variation  within  districts  over  time,  after 
accounting  for  common  macro  shocks  affecting  all  districts.  The  district  fixed  effect  is  particularly 
important  because  it  controls  for  permanent  features  of  districts  that  may  have  made  them  both 
agriculturally  productive,  and  attractive  places  to  build  railroads. 

Prediction  4  states  that  the  coefficient  7  on  district  o’s  own  railway  access  will  be  positive,  but 
the  coefficient  -0  on  district  o’s  neighbors’  access  will  be  negative.  A  number  of  alternative  theories 
(whether  stressing  trade  mechanisms  or  otherwise)  could  make  similar  predictions  about  the  signs 
of  these  coefficients  (especially  about  7).  For  this  reason,  in  Step  6  1  go  beyond  the  qualitative  test 
of  my  model  provided  by  the  signs  of  7  and  and  assess  the  quantitative  performance  of  the  model 
in  predicting  real  income  changes  due  to  the  expansion  of  the  railroad  network. 


1  begin  below  (in  section  7.3)  by  estimating  equation  (18)  using  OLS.  Unbiased  OLS  estimates 
require  there  to  be  no  correlation  between  the  error  term  ( eot )  and  the  regressors  ( RAILot  and 
(tt)  RAILdt),  conditional  on  the  district  and  year  fixed  effects.  This  requirement  would 

fail  if  railroads  were  built  in  districts  and  years  that  were  expected  to  experience  real  agricultural 
income  growth,  or  if  railroads  were  built  in  districts  that  were  on  differing  unobserved  trends  from 
non-railroad  districts.  For  this  reason  1  pursue  three  strategies  to  assess  the  potential  magnitude  of 
bias  in  my  OLS  results  due  to  non-random  railroad  placement:  four  placebo  specifications  (section 
7.4),  instrumental  variable  estimates  (section  [A5|  and  a  bounds  check  (section  [F6|. 


capita  is  potentially  less  so.  Second,  migration  could  respond  to  differential  productivity  improvements  over  space. 
Migration,  however,  was  extremely  limited  in  colonial  India  when  compared  to  other  countries  in  the  same  time 
period  (a  feature  that  is  still  true  today,  and  that  Munshi  and  Rosenzweig  (2007)  argue  is  due  to  informal  insurance 
provided  by  localized  caste  networks),  and  the  little  migration  that  occurred  was  vastly  skewed  toward  women 
migrating  to  marry  (Davs  1951,  Rosenzweig  and  Stark  1989).  Nevertheless,  to  test  the  hypothesis  that  the  limited 
migration  was  correlated  with  railroad  construction  I  have  collected  data  on  district-to-district  bilateral  migration  as 
revealed  from  birthplaces,  recorded  in  the  decadal  censuses  in  colonial  India.  I  find  (in  OLS  regressions  that  control 
for  district  pair  fixed-effects)  that  there  is  no  statistically  significant  net  migration  into  districts  receiving  a  railroad 
line  from  neighboring  districts  left  off  the  railroad  network  (so  migration  is  unlikely  to  have  been  strong  enough 
to  act  to  equalize  output  per  capita),  and  that  bilateral  railroad  connections  between  districts  do  not  statistically 
significantly  correlate  with  bilateral  migration  between  districts  (so  the  railroads  do  not  seem  to  have  facilitated 
migration).  However,  I  do  not  observe  intra-district  migration,  which  may  have  been  significant. 

62In  the  model  this  price  index  is  given  in  equation  §•  However,  it  would  be  unsurprising  if  a  price  index 
calculated  in  the  manner  suggested  by  my  theory  fits  my  model  well.  To  perform  a  more  powerful  test  of  the  model 
I  therefore  use  a  flexible  price  index  (the  Fisher  ideal  price  index)  of  the  sort  that  is  commonly  used  to  construct 
real  GDP  measures  from  national  income  statistics. 

63 As  in  section  |6j  I  take  the  neighborhood  of  district  o  (denoted  by  N0,  and  containing  Na  districts)  to  consist  of 
any  districts  for  which  any  part  of  the  district  lies  within  a  250  km  radius  of  the  centroid  of  district  o. 
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7.2  Data 


I  estimate  equation  (18)  using  annual  data  on  real  agricultural  income  (per  acre  of  land)  in  239 
districts,  from  1870  to  1930.  This  variable  (calculated  as  nominal  agricultural  GDP  calculated  from 
17  crops,  deflated  by  a  consumer  price  index  and  then  divided  by  the  district’s  land  area)  was 
described  briefly  in  section  [2]  and  in  more  detail  in  Appendix  [Aj  The  variables  RAILot  and  B  A  I  Ldl 
were  described  in  section  [5] 


7.3  Results 


Column  1  of  table  5  presents  OLS  estimates  of  equation  (18),  with  only  the  own-district  regressor 


included.  The  coefficient  estimate  is  0.164,  implying  that  in  the  average  district,  the  arrival  of  the 
railroad  network  raised  real  agricultural  income  by  over  sixteen  percent. 

Prediction  4,  along  with  the  customs  union  literature  in  international  trade  theory,  predicts  that 
a  district  can  suffer  from  trade  diversion  when  one  or  more  of  its  trade  partners  gains  improved 
access  to  a  third  region’s  market.  Because  the  arrival  of  the  railroad  network  is  spatially  correlated, 
the  specification  in  column  1  may  confound  the  positive  effects  of  a  district’s  own  access  to  the 
railroad  network  with  the  negative  effect  of  access  by  its  neighbors.  Column  2  of  table  5  checks 
for  this  negative  effect  of  railroads  by  including  as  an  additional  regressor  the  extent  to  which  a 
district’s  neighbors  are  connected  to  the  railroad  network  (as  in  equation  (18)).  The  coefficient  on 


this  additional  variable  is  negative  and  statistically  significant,  indicating  that  losses  from  trade 
diversion  are  at  work  when  a  district’s  trading  partners  reduce  trade  costs  between  them  but  not 
the  district  of  observation.  In  addition,  the  coefficient  on  own-district  railroad  access  is  higher  in 
column  2  than  in  column  1 — the  point  estimate  (which  is  still  statistically  significant)  now  suggests 
that  railroad  access  increases  real  agricultural  income  per  acre  by  over  18  percent. 

The  results  from  including,  in  column  2,  neighbors’  railroad  access  highlight  that  railroad  projects 
had  a  treatment  externality  on  untreated  districts.  It  is  important  to  control  for  this  treatment 
externality  to  prevent  bias  in  estimates  of  the  effect  of  railroad  access.  This  result  also  highlights 
potential  distributional  consequences  of  railroad  construction,  whereby  a  project  that  is  good  for 
one  region  may  be  bad  for  its  neighbors. 
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The  OLS  results  described  here  are  in  line  with  Prediction  4  and  suggest  that  railroads  has  a 
large  effect  on  real  income  in  India.  In  the  following  three  sections  I  pursue  three  strategies  to 
explore  the  robustness  of  these  findings  to  concerns  over  the  non-random  placement  of  railroads. 


64My  estimates  suggest  that  the  districts  where  railroads  are  built  could  feasibly  make  transfers  to  their  neighbors 
that  would  compensate  neighbors  while  leaving  the  constructing  district  better  off.  Unfortunately,  I  have  been 
unable  to  find  any  data  that  would  shed  light  on  the  extent  of  such  transfers.  Duflo  and  Pande  (2007)  use  data  from 
a  household  consumption  survey  to  measure  the  effect  of  dam  construction  on  consumption  and  poverty;  the  use 
of  consumption  data,  which  was  recorded  after  any  potential  compensating  transfers,  allows  these  authors  to  argue 
that  compensation  of  districts  harmed  by  dam  construction  appears  to  be  incomplete  (and  especially  incomplete 
in  districts  with  a  history  of  relatively  more  extractive  institutions).  Unfortunately,  such  household  consumption 
surveys  only  began  in  India  in  1950. 
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7.4  Four  Placebo  Specifications 


Empirical  Strategy: 

The  first  strategy  I  use  to  mitigate  concerns  of  bias  due  to  non-random  railroad  placement  is  to 
estimate  the  effects  of  ‘placebo’  railroad  lines:  over  40,000  km  of  railroad  lines  that  went  through 
various  stages  of  the  planning  process,  but  were  never  actually  built.65  I  group  these  placebo  lines 
into  four  categories: 


1.  Four-stage  planning  hierarchy:  From  1870-1947,  India’s  Railways  Department  used  one  con¬ 
stant  system  for  the  evaluation  of  new  railroad  projects]^]  Line  proposals  received  from  the 
Indian  and  provincial  governments  would  appear  as  proposed  in  the  Department’s  annual 
Railway  Report.  This  invited  further  discussion,  and  if  the  proposed  line  survived  this  criti¬ 
cism  it  would  be  reconnoitered.  Providing  this  reconnaissance  uncovered  no  major  problems, 
every  meter  of  the  proposed  line  would  then  be  surveyed ,  this  time  in  painstaking  and  costly 
detail  (usually  taking  several  years  to  complete)]^]  These  detailed  surveys  would  provide 
accurate  estimates  of  expected  construction  costs,  and  lines  whose  surveys  revealed  modest 
costs  would  then  be  passed  on  to  the  Government  to  be  sanctioned ,  or  given  final  approval. 
The  railroad  planning  process  was  therefore  arranged  as  a  four-stage  hierarchy  of  tests  that 
proposed  lines  had  to  pass,68  I  estimate  equation  (18),  but  additionally  include  regressors  for 


railroad  lines  abandoned  at  each  of  these  four  planning  stages  (with  separate  coefficients  on 
each).  If  line  placement  decisions  were  driven  by  unobservable  determinants  of  agricultural 
income,  it  is  likely  that  unbuilt  lines  would  exhibit  spurious  effects  (relative  to  the  excluded 
category,  areas  in  which  lines  were  never  even  discussed)  on  agricultural  income  in  OLS  re¬ 
gressions.  Further,  it  is  likely  that  the  lines  that  reached  later  planning  stages  would  exhibit 
larger  spurious  effects  than  the  lines  abandoned  early  on  (because  higher  expected  benefits 
would  be  required  to  justify  the  increasingly  costly  survey  process).  The  absence  of  such  a 
pattern  would  cast  doubt  on  the  extent  to  which  India’s  Railways  Department  was  selecting 
districts  for  railroad  projects  on  the  basis  of  correlation  with  the  error  term  in  equation 


2.  Lawrence’s  proposal:  In  1868,  Viceroy  John  Lawrence  (head  of  the  Government  of  India) 
proposed  and  had  surveyed  a  30-year  expansion  plan,  broken  into  5-year  segments,  that  would 

65This  strategy  is  similar  in  spirit  to  that  in  Greenstone  and  Moretti  (2004),  who  study  the  welfare  impact  of 
large  industrial  plants  in  the  United  States.  They  compare  economic  outcomes  in  the  counties  where  these  plants 
were  built  to  outcomes  in  the  plant’s  second-choice  county  (where  the  plant  was  not  built). 

66Strachey  and  Strachey  (1882)  review  the  early  history  of  the  Railways  Department  (part  of  the  Department  of 
Public  Works  until  1878).  The  Railway  Department’s  annual  Railway  Reports  describe  the  planning  system  in  each 
year  through  to  1947. 

6 'Reconnaissance  was  a  form  of  low-cost  survey  of  possible  track  locations  (typically  within  100  m  of  their  eventual 
location),  along  with  a  statement  of  all  necessary  bridges,  tunnels,  cuttings  and  embankments.  As  Davidson  (1868) 
and  Wellington  (1877)  make  clear,  surveying  was  much  more  detailed,  as  its  end  goal  was  to  identify  the  exact 
position  of  the  intended  lines,  and  a  precise  statement  of  all  engineering  works  (down  to  the  number  of  bricks 
required  to  build  each  bridge). 

68This  process  of  sequentially  more  detailed  investigation  is  echoed  in  Wellington  (1877),  the  standard  textbook 
for  railroad  engineers  and  surveyors,  in  all  English-speaking  countries,  in  its  day  (which  ran  to  six  editions  by  1906). 
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begin  where  Dalhousie’s  trunk  lines  (described  in  section  2.3)  left  off.69  Lawrence  consulted 


widely  about  the  optimal  routes  for  this  railroad  expansion,  and  drew  upon  his  twenty-six 
years  of  experience  as  an  administrator  in  India.  Upon  his  retirement  (from  his  fixed,  five-year 
term)  in  1869,  construction  on  Lawrence’s  plan  had  just  begun.  But  Lawrence’s  successor,  the 
Earl  of  Mayo,  immediately  halted  construction  and  vetoed  Lawrence’s  proposal.  Mayo  was 
an  outsider  (who  had  never  been  to  India  before  his  appointment)  and  a  fiscal  conservative, 
and  he  wasted  no  time  in  criticizing  the  high  costs  of  railroad  construction  in  India.  Instead, 
Mayo  followed  a  more  cautious  approach  to  railroad  expansion  and  Lawrence’s  plan  was  never 
built.  However,  Lawrence’s  plan  provides  a  useful  window  on  the  trajectory  that  he  and  his 
Government  expected  in  the  districts  where  they  planned  to  expand  the  railroad  network. 
If  anyone  was  capable  of  forecasting  developments  in  each  district’s  trading  environment, 


developments  that  may  be  correlated  with  the  error  term  in  equation  (18),  it  was  likely  to 


be  Lawrence.  To  check  for  this,  I  estimate  equation  (18)  and  additionally  include  lines  that 


were  part  of  Lawrence’s  proposal.  Because  Lawrence’s  proposal  was  broken  into  six,  five- 
year  segments,  I  allow  for  separate  coefficients  on  each  of  these  segments  and  assume  that  the 
stated  lines  in  a  given  five-year  period  would  have  opened  at  the  beginning  of  the  period.  This 
provides  an  additional  check:  lines  that  Lawrence  proposed  to  be  built  in  relatively  early  time 
segments  were  presumably  more  attractive,  higher  priority  proposals,  that  in  addition  were 
made  under  a  shorter  forecast  horizon.  Therefore,  to  the  extent  that  Lawrence  was  able  to 
forecast  district-level  developments,  larger  spurious  effects  should  be  found  on  these  segments. 

3.  Bombay  and  Madras  Chambers  of  Commerce  proposals:  In  1883,  the  Bombay  and  Madras 
Chambers  of  Commerce  (bodies  representing  commercial  interests)  were  invited  to  submit 
railroad  expansion  proposals.  Their  proposals  recommended  railroad  expansion  into  areas 
with  unrealized  commercial  potential  (where  the  Chambers’  interests  lay).  However,  the 
Chambers’  proposals  were  dismissed  for  paying  too  little  attention  to  the  potential  costs 
of  building  these  lines  (costs  that  the  Chambers  would  not  incur).  Because  it  is  plausible 
that  the  Chambers  possessed  a  great  deal  of  expertise  in  the  identification  of  commercial  op- 
port  unities  J^]  the  Chambers’  expansion  proposals  provide  a  unique  window  on  the  expected 
commercial  trajectory  in  the  regions  where  the  Chambers  recommended  construction.  I  es¬ 


timate  equation  (18)  and  additionally  include  lines  that  were  mentioned  in  the  Bombay  and 


Madras  Chambers  of  Commerce  proposals.  If  the  expected  commercial  trajectories  identified 


by  these  Chambers  are  correlated  with  the  error  term  in  equation  (18),  then  unbuilt  lines  in 
the  Chambers’  proposals  should  display  spurious  effects  on  real  agricultural  income.  If  no 
such  effects  are  observed  then  this  would  call  into  question  the  ability  for  less  commercially- 
interested  agents,  such  as  the  Government  of  India  (which  planned  India’s  railroad  network) 


69These  segments  appear  in  the  plan  (published  in  1868)  as  “to  be  built  over  the  next  5  years”,  “to  be  built 
between  6  and  10  years  from  now”,  etc. 

70The  potential  for  such  expertise  is  clear  in  histories  of  the  Bengal,  Madras,  Upper  India,  and  Indian  Chambers  of 
Commerce  in  Tyson  (1953),  Times  of  India  (1938),  Tirumalai  (1986),  and  Namjoshi  and  Sabade  (1967),  respectively. 
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to  systematically  forecast  commercial  developments  in  India’s  districts. 


4.  Kennedy’s  proposal:  India’s  early  line  placement  followed  the  suggestions  of  Lord  Dalhousie 
(then  head  of  the  Government  of  India),  but  only  after  Dalhousie’s  decade-long  debate  with 
Major  Kennedy  (then  India’s  Chief  Engineer,  who  was  charged  with  planning  India’s  first 
railroad  lines)  over  optimal  route  choice.  Kennedy  was  convinced  that  railroad  construction 
would  be  extremely  expensive  in  India  (Davidson  1868).  He  therefore  sought  to  connect  Dal¬ 
housie’s  chosen  provincial  capitals  with  a  network  of  lines  that  followed  the  gentlest  possible 
gradients,  along  river  gradients  and  the  coastline  wherever  possible]^]  Kennedy’s  proposal 
is  useful  for  my  identification  strategy  because  it  identifies  districts  with  low  railroad  con¬ 
struction  costs.  Geographical  features  that  favor  low  construction  costs  (such  as  topography, 
vegetation,  and  climate)  may  also  favor  agricultural  production,  and  may  result  in  differential 
unobservable  trends  in  the  real  agricultural  income  of  districts  with  favorable  construction 
conditions;  if  favorable  construction  conditions  drove  railroad  placement  decisions  then  OLS 
estimates  of  equation  ( 18 )  would  erroneously  attribute  unobserved  trends  to  railroad  construc¬ 
tion.  I  therefore  estimate  equation  (18)  including  a  variable  that  is  an  interaction  between 
an  indicator  variable  that  captures  districts  that  would  have  been  penetrated  by  Kennedy’s 
proposed  network  and  a  time  trendj^]  If  this  variable  predicts  real  agricultural  income  then 
this  would  be  a  concern  for  my  identification  strategy  as  it  would  suggest  that  the  features 
that  Kennedy  found  favorable  for  railroad  construction  (features  that  are  presumably  just 
as  favorable  to  his  successors)  are  correlated  with  real  agricultural  income  growth.  Because 
Kennedy’s  subdivided  his  proposal  into  high  and  low  priority  lines  I  also  look  for  differential 
trends  across  these  designations. 


Results: 

Table  6  presents  estimates  of  the  four  placebo  specifications  described  above.  Column  1  compares 
the  effect  of  railroad  lines  that  were  actually  built  to  unbuilt  railroad  lines  that  were  abandoned 
at  various  stages  of  the  four-stage  planning  hierarchy.  The  coefficients  on  unbuilt  lines  are  never 
statistically  significantly  different  from  zero,  or  of  the  same  order  of  magnitude  as  built  lines.  Impor¬ 
tantly,  the  coefficients  on  each  hierarchical  stage  of  the  approval  process  do  not  display  a  tendency 
to  increase  as  they  reach  advanced  stages  of  the  planning  process. 

Column  2  looks  for  spurious  effects  from  lines  identified  in  Lawrence ’s  proposal.  The  coefficients 
on  the  lines  that  he  proposed  are  all  close  to  zero,  an  order  of  magnitude  smaller  than  the  coefficient 

71The  network  that  was  built,  by  contrast,  took  straight  lines  in  almost  all  circumstances,  requiring  in  many 
cases  (such  as  the  Thai  and  Bhor  Ghats)  some  of  the  most  advanced  railroad  engineering  works  the  world  had 
ever  seen  (Andrew  1883).  By  1869  it  was  clear  that  Kennedy’s  anticipated  construction  costs  were,  if  anything, 
underestimates.  These  high  construction  costs  were  a  major  factor  in  Mayo’s  decision  to  abort  Lawrence’s  plan,  as 
described  in  my  second  placebo  variable. 

,2Since  Kennedy’s  proposal  was  first  submitted  in  1848,  but  my  real  agricultural  income  data  begins  in  1870,  I 
cannot  estimate  the  contemporaneous  impact  of  Kennedy’s  proposed  lines  in  the  same  manner  as  my  other  three 
placebo  specifications. 
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on  built  lines,  and  never  statistically  significant.  Further,  the  estimated  coefficients  on  Lawrence’s 
early  proposals  are  no  larger  on  average  than  those  on  his  later  proposals. 

Column  3  performs  a  similar  exercise  using  lines  chosen  by  the  Bombay  and  Madras  Chambers  of 
Commerce.  The  coefficients  on  the  two  Chambers’  proposed  lines  are  positive  but  very  close  to  zero 
and  not  statistically  significantly  different  from  zero.  And,  as  in  column  2,  these  coefficients  are  an 
order  of  magnitude  smaller  than  the  (statistically  significant)  coefficients  on  built  railroad  lines. 

Finally,  column  4  examines  the  extent  to  which  districts  identified  in  Major  Kennedy’s  proposal , 
as  inexpensive  districts  in  which  to  construct  railroads,  display  different  real  agricultural  income 
trends  from  other  districts.  The  coefficients  on  Kennedy’s  two  types  of  identified  lines  (high  and 
low  priority)  are  both  close  to  zero  and  not  statistically  significantly  different  from  zero.  Crucially, 
the  inclusion  of  this  variable  does  not  change  appreciably  the  coefficient  on  built  railroads. 

These  four  sets  of  results  display  a  consistent  pattern.  Regardless  of  the  expert  choosing  potential 
railroad  lines  (India’s  public  works  department,  India’s  most  senior  administrator  at  the  height  of  his 
26-year  career,  commercial  interest  groups,  or  India’s  chief  engineer),  or  the  motivation  for  doing  so 
(lines  attractive  to  the  government  for  many  potential  reasons,  commercially  attractive  lines,  or  low 
costs  of  construction)  unbuilt  lines  identified  by  these  experts  are  uncorrelated  with  time- varying  un¬ 
observable  determinants  of  real  agricultural  income  growth.  These  results  cast  doubt  on  the  extent 
to  which  the  Government  of  India  was  willing  or  able  to  allocate  railroads  to  districts  on  the  basis 
of  their  expected  evolution  (or  factors  correlated  with  this  evolution)  in  real  agricultural  income. 

7.5  Instrumental  Variable  Estimates 

Empirical  Strategy: 

After  the  1876-78  famine  in  India,  an  official  UK  parliamentary  commission — the  1880  Indian 
Famine  Commission — met  in  London  to  inquire  into  the  causes  of  this  famine,  and  how  future 
famines  might  be  prevented.  Of  the  11  commissioners,  nine  were  Members  of  Parliament,  and  no 
commissioner  possessed  particular  expertise  in  Indian  railroads  (Bhatia  1967).  Nevertheless,  the 
Commission  was  unique  among  previous  and  subsequent  famine  commissions  in  recommending  that 
railroads  could  prevent  famine.  Regions  that  received  inadequate  rainfall  (and  therefore  suffered 
from  famine)  in  the  1876-78  agricultural  years  were  highlighted  for  railroad  construction 

This  Commission’s  recommendation  motivates  my  instrumental  variable  (IV)  approach.  I  in¬ 
strument  for  railroad  construction  in  a  district  with  a  variable  that  is  an  interaction  between  the 
deviation  of  rainfall  in  the  district  in  the  1876-78  agricultural  years  (May  1876  to  April  1878)  from 
its  long-run  (1870-1930)  mean  (over  pairs  of  agricultural  years)  f^j  and  an  indicator  for  the  post- 
1884  periodj^]  I  demonstrate  below  that  this  variable  has  significant  predictive  power  for  railroad 

73The  1880  Commission  argued  that  the  1876-78  famine  had  been  exacerbated  by  slow  transportation  of  food 
into  famine-stricken  districts. 

74For  simplicity,  I  use  the  total  amount  of  rainfall  that  fell  in  this  period.  However,  I  obtain  similar  results  when 
I  instead  use  a  weighted  average  over  the  17  crop-specific  rainfall  variables  introduced  in  section  [6j  with  weights 
suggested  by  the  model  (and  introduced  in  section [8] below). 

75I  allow  four  years  for  the  Commission’s  recommended  lines  to  be  constructed  because  the  average  length  of  time 
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construction  in  a  district. 


The  exclusion  restriction  required  for  this  instrument  to  provide  consistent  estimates  of  the  co¬ 


efficient  7  in  equation  (18)  is  that  rainfall  shortages  in  the  1876-78  agricultural  year  affect  real 
agricultural  income  six  years  later  only  because  of  their  effect  on  railroad  construction  (due  to  the 
1880  Famine  Commission).  There  are  two  potential  concerns  with  this  exclusion  restriction.  The 
first  potential  concern  is  that  rainfall  may  affect  real  agricultural  income  directly,  because  rain¬ 
fall  is  an  important  input  for  rain-fed  agriculture  (I  find  direct  evidence  for  this  in  Step  5  below, 
and  indirect  evidence  for  this  as  revealed  in  trade  flows  and  price  responsiveness  in  Steps  2  and 
3,  respectively).  For  this  reason,  I  control  for  rainfall  and  rainfall  lagged  up  to  10  years  in  my  IV 
regressions.  As  I  show  below,  there  is  no  evidence  for  statistically  significant  effects  of  rainfall  after 
a  lag  of  more  than  one  year,  which  casts  doubt  on  the  concern  that  rainfall  shortages  in  1876-78 
have  a  direct  effect  on  real  agricultural  income  post- 1884. 

A  second  potential  concern  with  this  IV  strategy  is  that  famines  (or  the  official  inquiries  that 
followed  them)  may  have  long-lived  effects  on  real  income  by  potentially  changing  policies,  institu¬ 
tions,  demographics  (through  mortality,  fertility,  or  out-migration),  or  (animal  and  human)  capital 
stocks.  For  this  reason,  I  examine  whether  rainfall  deviations  (from  long-run  means)  in  the  ten 
other  years  in  which  famine  was  officially  declared  (and  official  inquiries  were  conducted)  in  India 
appear  to  affect  either  railroad  construction  or  real  agricultural  income  six  years  later |^]  First,  I 
find  that  rainfall  in  non-1876-78  famine  years  does  not  predict  railroad  construction  six  years  later; 
this  is  an  important  falsification  exercise  because,  of  all  the  ten  non- 1880  famine  inquiries,  it  was 
only  the  1880  inquiry  that  mentioned  railroad  construction.  Second,  I  find  that  in  no  other  (ie 
non-1876-78)  famine  year  do  rainfall  anomalies  affect  real  agricultural  income  six  years  later.  To 
the  extent  that  all  famines  and  their  inquiries  had  the  same  potential  for  long-lived  effects  on  agri¬ 
culture,  this  suggests  that  it  was  not  the  famine  or  its  inquiry  per  se  that  caused  rainfall  shortages 
in  1876-78  to  have  long-lived  effects  on  real  agricultural  income. 

In  the  light  of  these  checks,  a  remaining  concern  with  my  instrumental  variable’s  exclusion  re¬ 
striction  is  that  the  1876-78  famine,  or  its  1880  Commission,  was  unique  in  some  way  other  than  its 
effect  on  railroad  construction!77]  While  the  exclusion  restriction  is  fundamentally  untestable,  it  is 
comforting  that  the  most  obviously  unique  feature  of  the  1880  Commission  was  its  recommendation 
of  railroad  construction  (Bhatia  1967). 


Results: 


between  a  line  first  being  proposed  and  being  opened  for  traffic  in  my  sample  was  4.3  years. 

76There  were  official  parliamentary  famine  commissions  after  the  1896-97  and  1899-1900  famines,  in  addition  to 
that  after  the  1876-78  famine.  Official  government  inquiries  were  also  commissioned  after  the  1866-67,  1868-70, 
1873-74,  1888-89,  1905-06,  1906-07,  1907-08  and  1911-12  famines. 

7'This  seems  unlikely.  For  example,  Visaria  and  Visaria  (1983)  summarize  (in  Appendix  Table  5.2)  the  famines 
in  my  sample  period  in  tabular  form  along  four  dimensions:  the  number  of  people  killed,  and  the  geographic 
regions  (ie  districts),  land  area,  and  number  of  people  “affected”.  The  1876-78  famine  is  not  an  outlier  in  any  of 
these  dimensions.  Lengthier  treatments  of  famines  in  this  time  period,  such  as  Bhatia  (1967),  McAlpin  (1983), 
and  Maharatna  (1996),  do  not  see  the  1876-78  famine  as  particularly  unique  among  India’s  colonial-era  famines, 
especially  when  compared  to  the  more  severe  1896-97  and  1899-1900  famines. 
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Table  7  presents  instrumental  variable  estimates  of  equation  (18),  beginning  with  first-stage  es¬ 
timates  for  the  1880  Famine  Commission  instrumental  variable  in  column  1.  These  estimates 
demonstrate  that  the  instrumental  variable  has  a  strong  and  statistically  significant  effect  on  rail¬ 
road  location  (even  after  controlling  for  contemporaneous  rainfall,  lagged  rainfall  (of  up  to  3  lags) 
and  district  and  year  fixed  effects).  As  this  instrument  has  a  high  t-statistic,  and  the  model  is 
just-identified,  standard  concerns  over  weak  instruments  are  unlikely  to  arise  here  (Stock,  Wright, 
and  Yogo  2002)  j^]  Column  1  also  demonstrates  that  contemporaneous  and  lagged  rainfall  variables 
(up  to  three  years  of  lags)  do  not  predict  railroad  construction  in  general — these  four  variables  are 
individually  and  jointly  insignificant However,  rainfall  anomalies  in  one  particular  period,  the 
1876-78  agricultural  years  that  were  under  the  remit  of  the  1880  Famine  Commission,  do  predict 
railroad  construction  after  1884. 

Column  2  checks  whether  rainfall  anomalies  in  ten  other  famine  years  that  were  officially  declared 
as  famines  by  the  Government  of  India,  other  than  1876-78,  predict  railroad  construction.  After 
each  of  these  famine  years,  official  reports  were  commissioned  to  recommend  policies  for  future 
famine,  just  as  after  the  1876-78  famine.  To  avoid  estimating  ten  coefficients  (one  for  each  of  the 
ten  famines)  I  estimate  only  two  different  effects  from  these  non-1876-78  famines:  one  for  the  five 
famines  that  were  relatively  more  extreme  (as  defined  by  the  number  of  people  “affected” )  J^j  and 
one  for  the  five  that  were  relatively  less  extreme.  The  report  on  the  1876-78  famine  was  unique  in 
strongly  recommending  railroad  construction  for  famine  prevention,  and  the  results  in  column  2  are 
consistent  with  this  view:  I  find  that  rainfall  anomalies  in  other  officially-declared  famine  years  do 
not  predict  railroad  construction,  but  anomalies  in  1876-78  do. 

The  second-stage  results,  using  the  instrument  to  predict  the  railroad  dummy  variable  (‘railroad 
in  district’),  are  presented  in  columns  3  and  4  of  table  7.  Column  3  includes  the  own-railroad 
and  neighboring  district  railroad  dummy  variables;  these  IV  estimates  are  statistically  significantly 
different  from  zero,  and  of  a  very  similar  magnitude  to  the  OLS  results  presented  in  table  5.  This 
suggests  that  railroad  line  placement  decisions  were  not  driven  by  unobservable  and  time-varying 
determinants  of  real  agricultural  income,  other  than  those  already  controlled  for.  Importantly,  in 
column  3  I  find  that  while  contemporaneous  rainfall  has  a  large  and  statistically  significant  effect  on 
real  agricultural  income  (in  line  with  OLS  results  in  table  5),  lagged  values  of  rainfall  (up  to  three 
lags)  appear  to  have  no  effect]^]  This  is  reassuring  from  the  perspective  of  the  exclusion  restriction 
for  the  use  of  rainfall  in  1876-78  as  an  instrument  for  railroad  construction  post-1884.  Finally,  in 
column  4  I  check  whether  rainfall  anomalies  in  officially-declared  famine  years  (other  than  1876-78) 
have  an  effect  on  real  agricultural  income.  I  find  no  statistically  significant  coefficients  on  these  vari- 


78The  (herteroskedasticity  and  serial  correlation  robust)  F-statistic  on  the  excluded  instrument  in  the  first  stage 
is  7.91  (ie  the  square  of  this  variable’s  t-statistic). 

79 An  F-test  for  their  joint  significance  has  a  p- value  of  0.78  implying  that  the  null  hypothesis  that  their  coefficients 
are  all  zero  cannot  be  rejected. 

80I  take  this  measure  from  Visaria  and  Visaria  (1983),  Appendix  5.2.  This  is  a  more  reliable  measure  of  famine 
intensity  than  the  number  of  people  killed  because  of  the  difficulty  of  measuring  deaths  in  these  instances.  The  five 
most  severe  famines  were  those  in  1868-70,  1873-74,  1896-97,  1899-1900  and  1907-08. 

81I  find  that  the  same  is  true  for  rainfall  lags  up  to  10  years. 
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ables  (individually  or  jointly),  which  suggests  that  there  are  no  long-run  effects  of  rainfall  anomalies 
in  famine  years,  other  than  in  1876-78.  This  is  likely  to  be  due  to  the  unique  feature  of  the  1880 
Famine  Commission — that  it  recommended  railroad  construction. 


7.6  Bounds  Check 


Empirical  Strategy 

A  concern  when  estimating  equation  (18)  is  that  of  indeterminate  bias  due  to  either  positive  or  neg¬ 


ative  selection  on  time-varying  unobservables:  some  railroad  projects  may  have  targeted  districts 
where  growth  was  expected  and  infrastructure  would  earn  higher  returns  (which  would  introduce 
positive  bias  due  to  selection  on  unobservables);  other  railroad  projects  may  have  targeted  lagging 
districts  that  were  nonetheless  politically  important  (which  would  introduce  negative  bias  due  to 
selection  on  unobservables).  The  final  strategy  I  employ  to  mitigate  concerns  about  non-random 
placement  explores  the  empirical  relevance  of  this  positive  and  negative  bias  due  to  potential  selec¬ 
tion  on  unobservables. 

All  lines  built  between  1883  and  1904  were  required  to  be  placed  in  one  of  four  categories:  ‘produc¬ 
tive’  (expected  to  be  commercially  remunerative),  ‘protective’  (intended  to  promote  development 
in  poorer,  famine-prone  regions),  ‘productive  and  protective’,  or  ‘military’  (built  for  strategic  mo¬ 
tives).  These  categories  were  used  for  administrative  purposes,  but  did  not  have  any  bearing  on  how 
a  line  could  be  usedj^]  I  interpret  the  lines  that  were  categorized  as  ‘productive’  as  being  expected 
to  earn  high  returns  (and  lead  to  positive  bias),  and  the  lines  categorized  as  ‘protective’  as  being 
targeted  towards  lagging  regions  (leading  to  negative  bias).  Therefore,  a  comparison  of  the  effects 
of  lines  that  are  designated  as  ‘productive’  and  ‘protective’  will  reveal  bounds  on  the  true  effect  of 
I83 1  If  these  bounds  are  tight  then  bias  due  to  non-random  railroad  placement  is  unlikely 


railroads 


to  be  quantitatively  important.  As  a  further  check  on  this  procedure,  the  effect  of  lines  designated 
as  ‘protective  and  productive’  should  lie  in  between  those  from  ‘protective’  and  ‘productive’  lines. 
Finally,  the  lines  designated  as  ‘military’  could  be  biased  in  either  direction. 


To  implement  this  strategy  I  estimate  equation  (18)  with  five  separate  coefficients  on  the  own- 


railroad  regressor  ( RAILot ):  one  coefficient  on  each  of  the  four  categories  of  lines  built  between 
1883  and  1904  inclusive,  and  a  fifth  coefficient  on  lines  built  before  1883  or  after  1904  (during  which 
the  categorization  of  railroad  lines  did  not  occur). 


Results: 

Table  8  contains  the  results  of  the  bounds  check  that  I  use  to  assess  the  magnitude  of  potential  bias 
due  to  non-random  placement.  For  purposes  of  comparison  with  earlier  results,  column  1  replicates 

82The  annual  Railway  Reports  reported  railroad  projects  according  to  these  categories.  The  initial  motivation  for 
this  categorization  scheme  first  appeared  in  House  of  Commons  Papers  (1884). 

83 An  alternative  interpretation  of  differential  effects  would  be  that  different  types  of  railroad  projects  have 
heterogeneous  treatment  effects.  As  long  as  protective  lines  have  lower  treatment  effects  than  productive  lines,  the 
average  treatment  effect  will  still  be  bounded  by  the  OLS  estimates  from  these  two  different  types  of  lines.  This 
concern  simply  widens  the  bounds. 
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column  2  of  Table  5. 


Column  2  of  Table  8  presents  OLS  estimates  from  five  different  types  of  railroad  lines  used  in  equa¬ 


tion  (18) — four  from  the  four  different  categories  in  which  lines  were  placed  between  1883  and  1904, 
and  one  for  all  lines  built  before  1883  or  after  1904.  The  point  estimates  on  these  five  types  of  lines 
arc  all  similar  to  each  other.  As  anticipated  by  the  argument  above,  lines  categorized  as  ‘productive’ 
have  the  highest  estimated  coefficients,  reflecting  a  potential  upward  bias  on  these  estimates  (due 
to  selection  on  unobservables).  Likewise,  the  coefficient  on  lines  categorized  as  ‘protective’  is  the 
lowest  of  the  five  coefficients,  reflecting  bias  due  to  negative  selection  on  unobservables.  However, 
the  difference  between  these  two  coefficients  is  very  small  in  comparison  with  the  magnitude  of  the 
effect  of  an  average  line.  This  suggests  that  the  scope  for  positive  or  negative  bias  due  to  endoge¬ 
nous  selection  is  small.  Put  another  way,  the  coefficient  on  ‘protective’  lines — which  is  likely  to  be 
an  underestimate  of  the  effect  of  railroads  on  real  agricultural  income — is  still  almost  17  percent, 
suggesting  an  important  role  for  railroads  in  increasing  real  incomer3 


7.7  Summary  and  Interpretation 


Taken  together,  the  results  from  my  placebo,  instrumental  variable,  and  bounds  procedures  suggest 
that  my  earlier  OLS  results  in  table  5  can  be  interpreted  as  close  approximations  to  unbiased  es¬ 
timates  of  the  effect  of  railroads  on  real  agricultural  income  in  India.  The  impression  left  by  these 
three  procedures  is  that  administrators  in  colonial  India  allocated  railroads  to  districts  at  times  that 
were  not  related  to  unobservable,  time-varying  determinants  of  real  agricultural  income.  This  is 
perhaps  unsurprising  given  the  strong  military  motivations  for  building  railroads  in  India  outlined 
in  section  [2j  and  the  difficulty  in  forecasting  the  attractiveness  of  competing  railroad  plans  (as  evi¬ 
denced  by  the  stark  disagreements  among  top-level  Indian  administrators  described  in  section  7.4). 

The  results  from  this  section  suggest  that  railroads  caused  a  large  (18  percent)  increase  in  real 
agricultural  income  in  India.  This  estimate  is  slightly  larger  than  the  estimate  I  would  obtain  from 
using  a  social  savings  methodology,  of  14.8  percent,85  The  social  savings  approach  is  known  to  suffer 


from  indeterminate  bias,  so  my  results  here  suggest  that  the  net  bias  to  social  savings  estimates, 
in  the  case  of  India,  is  negative]^]  Both  the  results  in  this  section  and  those  obtained  from  a  social 


84Two  further  points  are  of  note  in  table  8.  First,  the  lines  categorized  as  ‘productive  and  protective’  have 
a  coefficient  that  lies  between  those  on  ‘productive’  and  ‘protective’  lines.  This  is  sensible,  but  was  in  no  way 
preordained,  so  it  provides  a  check  on  the  logic  of  the  bounds  procedure.  Second,  the  lines  categorized  as  ‘military’ 
have  a  coefficient  that  is  similar  in  magnitude  to  that  on  all  other  types  of  lines.  This  coefficient  is  difficult  to 
interpret  without  a  clear  prior  on  the  direction  of  its  bias.  Nevertheless,  it  is  reassuring  that  this  coefficient  is  similar 
to  that  on  other  types  of  lines  (though  I  cannot  rule  out  the  case  that  military  lines  had  both  a  different  treatment 
effects  from  other  types  of  lines,  and  a  countervailing  bias  due  to  selection  on  unobservables). 

85The  social  savings  approach  (Fogel  1964)  seeks  to  estimate  the  decrease  in  national  income  that  would  have  re¬ 
sulted  had  railroads  not  existed,  and  if  the  factors  of  production  used  in  the  railroad  sector  had  instead  been  employed 
in  their  next-best  substitute  (O’Brien  (1977)  and  Fishlow  (2000)  review  this  literature).  Hurd  (1983)  performs  a  social 
savings  calculation  for  India,  which  I  adapt  here.  Hurd  uses  a  transportation  price  reduction  of  a  factor  of  four  due  to 
railroads;  my  results  from  Table  2  suggest  that  this  was  an  underestimate,  so  I  instead  us  a  reduction  of  a  factor  of  5.3 
(the  average  reduction  between  any  pair  of  districts  in  my  sample).  Using  this  reduction  of  5.3  rather  than  four  leads  to 
a  social  savings  of  9.7  percent  of  aggregate  GDP;  expressed  as  a  fraction  of  real  agricultural  income  this  is  14.8  percent. 
s(’Bias  arises  from  two  sources.  First,  because  he  was  arguing  against  the  ‘indispensability’  of  railroads,  Fogel 
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savings  calculation  ignore  any  welfare  effects  from  changes  in  the  volatility  of  real  income  due  to 
railroads.  In  the  next  section  I  look  for  evidence  of  such  effects,  as  predicted  by  my  model. 


8  Empirical  Step  5:  Railroads  and  Volatility 


Step  4  of  this  paper  has  argued  that  railroads  increased  the  level  of  real  agricultural  income  in  In¬ 
dia.  Prediction  5  of  my  model  suggests  that  railroads  may  have  caused  a  second  source  of  potential 


welfare  gains — a  reduction  in  the  volatility  of  real  incomes.  Indeed,  as  discussed  in  section  |7.5[  the 
1880  Famine  Commision  recommended  railroad  construction  for  exactly  this  reason.  In  this  section 
I  test  prediction  5  of  the  model,  and  shed  light  on  the  potential  for  railroads  to  reduce  the  second 
moment  of  income. 


8.1  Empirical  Strategy 

Prediction  5  states  that  railroads  will  reduce  the  responsiveness  of  real  agricultural  income  in  a 
district  to  its  own  rainfall  shocks.  Since  rainfall  was  a  stochastic  input  to  production,  a  reduction 
in  the  responsiveness  of  income  to  this  input  will  reduce  income  volatility.  To  test  prediction  5  I 
estimate  the  following  specification: 


ln  (s)  =  Po  +  Pt  +  lRAILot  +  ^i(^)  ^  RAIL*  +  rtp2 


V  fkRAIN; 


+  pJ^RAI Lot  x 


E  tkRAIN< 


+  Sot- 


(19) 


This  equation  augments  equation  (18)  to  allow  rainfall, 


,  to  affect  agricultural 

production,  and  for  railroad  access,  RAILot ,  to  moderate  the  influence  of  rainfall  on  real  agricultural 
income.  The  rainfall  variable  is  the  weighted  sum  of  crop-specific  rainfall  measures  (introduced  in 
section  [6]),  where  the  weights  are  suggested  by  equation  (10)  of  the  model.  Because  the  weights 
depend  on  the  parameters  #*,,  jik  and  k,  I  use  the  values  of  these  parameters  estimated  in  Steps  1  and 
2  to  calculate  the  weights.  (The  weights  sum  to  0.1,  not  to  one,  because  of  the  presence  of  k  and  0k). 

Earlier  results  (on  trade  flows  and  price  responsiveness)  suggested  that  rainfall  is  an  important 
input  to  agricultural  production;  the  coefficient  02  is  therefore  expected  to  be  positive.  Prediction  5 
states  that  the  coefficient  "03  will  be  negative,  implying  less  responsiveness  of  real  agricultural  income 
to  rainfall  variation  when  a  district  has  railroad  access.  Finally,  in  line  with  prediction  4  (and  as  I 
found  in  Step  4),  the  coefficients  7  and  ip  1  are  expected  to  be  positive  and  negative,  respectively. 


(1964)  chose  to  evaluate  social  savings  in  a  manner  (assuming  that  the  demand  for  transportation  was  perfectly 
inelastic)  that  was  deliberately  biased  upwards.  Second,  as  Fishlow  (1965),  David  (1969),  Williamson  (1974),  and 
Fogel  (1979)  have  argued,  the  social  savings  methodology  ignores  several  effects  of  railroads  (and  hence  arrives  at 
an  underestimate).  For  my  analysis,  the  most  relevant  of  these  uncounted  effects  is  that,  by  reducing  trade  costs, 
railroads  may  have  given  rise  to  aggregate  efficiency  gains  due  to  reallocations  in  transport-using  sectors.  This  is 
the  mechanism  stressed  in  the  trade  model  that  I  develop  (and  find  empirical  support  for)  here. 
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8.2  Results 


Table  9  presents  the  results  of  this  test  of  prediction  5.  Column  1  confirms  that  rainfall  is  an 
important  determinant  of  agricultural  production,  and  therefore  real  agricultural  income.  This  is 
in  line  with  my  results  from  Step  2  (where  high  rainfall  was  found  to  promote  exporting  success) 
and  from  Step  3  (where  high  rainfall  was  found  to  decrease  prices). 

However,  the  results  in  column  2  demonstrate  that  rainfall  is  a  much  stronger  determinant  of 
real  agricultural  income  before  a  district  gains  railroad  access  than  after.  That  is,  the  coefficient  on 
rainfall  (‘rainfall  in  district’)  is  2.4,  much  larger  than  in  column  1  (and  still  statistically  significant). 
This  coefficient  represents  the  responsiveness  of  real  agricultural  income  to  rainfall  before  the  district 
is  connected  to  the  railroad  network.  By  contrast,  the  effect  of  rainfall  on  real  agricultural  income 
after  a  district  gains  railroad  access  (represented  by  the  sum  of  the  coefficients  on  the  ‘rainfall  in 
district’  term  and  the  interaction  term  between  railroad  access  and  rainfall)  is  just  1.3.  Further, 
the  pattern  of  all  four  coefficients  in  column  2  is  in  line  with  that  predicted  by  predictions  4  and  5. 

These  results  suggest  that — in  line  with  prediction  5 — railroads  played  an  important  role  in  re¬ 
ducing  real  income  volatility  in  India  because  they  reduced  the  responsiveness  of  a  district’s  real 
income  to  its  rainfall  Q  A  reduction  in  the  volatility  of  real  income  may  have  contributed  to  welfare 
gains  in  this  setting,  where  most  citizens  had  no  access  to  formal  insurance  or  banking  facilities 
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9  Empirical  Step  6:  A  Sufficient  Statistic  for  Railroads 


Steps  1  to  3  of  this  paper  have  argued  that  railroads  significantly  improved  the  trading  environment 
in  India.  Steps  4  and  5  demonstrated  that  railroads  also  raised  the  level  of  real  agricultural  income, 
and  reduced  the  volatility  of  real  agricultural  income.  These  two  sets  of  results  are  qualitatively 
consistent  with  each  other,  in  the  context  of  my  model.  In  this  section  I  explore  whether  these  two 
sets  of  results  are  also  quantitatively  consistent  with  each  other  in  the  context  of  my  model.  Because 
the  reduced-form  impact  could  arrive  through  a  number  of  mechanisms,  the  exercise  in  this  section 
can  also  be  thought  of  as  determining  the  share  of  the  observed  reduced  form  impact  of  railroads 
that  can  be  explained  by  the  trade-based  mechanism  in  my  model. 


87It  is  possible  that  while  railroad  access  reduced  the  responsiveness  of  a  district’s  real  income  to  its  own  rainfall, 
railroad  connections  could  have  increased  the  responsiveness  to  neighboring  districts’  rainfall  (as  I  found  in  the 
case  of  prices,  in  Step  3).  I  have  estimated  a  specification  similar  to  equation  (19)  but  with  an  extension  to  include 
dependence  on  neighboring  districts’  rainfall  and  an  interaction  term  for  neighboring  districts  that  are  bilaterally 
connected  by  rail  to  the  district  of  observation.  However,  the  coefficients  on  these  two  additional  terms  (neighbors’ 
rainfall  and  neighbors’  rainfall  for  railroad  connected  neighbors)  are  small  and  not  statistically  different  from  zero 
(jointly  or  individually). 

88Roy  (2001)  describes  how  even  even  the  wealthiest  members  of  society  in  colonial  India  (outside  of  major 
cities)  resorted  to  money-boxes  and  jewelery  as  the  only  means  to  save.  Rosenzweig  and  Binswanger  (1993)  and 
Rosenzweig  and  Wolpin  (1993)  document  limited  access  to  insurance  in  post-Independence  India.  The  gains  from 
reduced  consumption  volatility  may  have  been  even  more  important  to  poor  consumers  due  to  subsistence  concerns 
(or  if  risk  aversion  decreases  with  income  more  generally). 
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9.1  Empirical  Strategy 


In  order  to  compare  the  reduced-form  impact  of  the  railroad  network  on  each  district’s  real  agricul¬ 
tural  income  (estimated  in  Steps  4  and  5)  to  the  impact  that  is  predicted  by  my  model,  I  exploit 


prediction  6.  This  prediction  is  equation  (10),  restated  here  for  convenience: 


lnlj£)=ES1“^i 

k 


E 


(20) 


Prediction  6  thus  states  that  real  agricultural  income  (&*-)  is  a  function  of  only  two  terms:  tech¬ 
nology  (Akt)  and  ‘openness’  (vr^ot,  the  share  of  district  o’s  expenditure  that  it  buys  from  itself), 
appropriately  summed  over  all  commodities  k. 

To  estimate  this  equation  I  need  to  substitute  in  observable  variables  for  the  unobserved  terms  Akot 
and  7^.^]  and  the  unobserved  parameters  9  k  and  /q; .  I  estimated  the  function  In  Akt  =  nRAINkt 
(where  RAINkt  is  observable)  and  the  parameters  6k  in  Step  2  (using  trade  data),  and  the  pa¬ 
rameter  Hk  is  simply  the  consumer’s  budget  share J^]  Finally,  I  compute  the  openness  term  that 
emerges  in  equilibrium  in  my  model  when  it  is  evaluated  at  the  parameters  k,  9k,  and  and 
5  and  a.  (estimated  in  Step  1  using  salt  price  data).  I  refer  to  the  computed  openness  term  as 


oot 


(©,  RAIN*,  Rt,  L)  to  denote  its  dependence  on  the  full  vector  of  estimated  model  parameters 
0  =  ( 6,p,ct,5,K ),  as  well  as  the  full  vector  (across  districts,  commodities  and  years)  of  exogenous 
variables,  rainfall  (RAIN*),  the  transportation  network  (Rf),  and  land  sizes  (L). 


Prediction  6  (ie  equation  (20))  states  that,  once  rainfall  (ie  Akot)  is  controlled  for  (and  weighted 
over  commodities  k  in  the  manner  suggested  by  this  equation),  openness  (nkot)  in  year  t  is  a  suf¬ 
ficient  statistic  for  the  impact  of  the  entire  railroad  network  open  in  year  t  on  real  income  in  year 


t.  To  test  prediction  6  I  estimate  equation  (19),  but  additionally  include  the  sufficient  statistic 
variable,  openness  (vr^ot): 


ln  (g)  =  Po  +  Pt  +  iRAILot  +  Mjto)Y1  RAILdt  +  ^ 


rfeN0 


+  ^sRAILot  x 


E  tkRAIN- 


E  fr^RAIN, 


+  r,  E  f  lni&(§,  RAIN,,  R„  L)  +  (21) 


If  openness  is  truly  a  sufficient  statistic,  as  predicted  by  my  model,  then  when  openness  is  in¬ 


cluded  in  equation  (21)  all  other  railroad  variables  should  lose  predictive  power.  That  is,  prediction 


6  states  that  the  coefficients  7,  "01  and  03  should  be  zero  in  this  regression.  Further,  taking  the 


model  equation  (20)  literally,  prediction  6  also  states  that  the  coefficients  02  and  7  will  equal  one 


89While  it  would  be  possible  in  principle  to  use  trade  data  to  observe  nkot  in  the  data,  this  faces  two  limitations: 
first,  as  the  model  makes  clear,  70ot  is  endogenous  to  the  error  term  in  equation  (20),  so  an  instrumental  variables 


methodology  would  be  necessary;  and  second,  the  only  internal  trade  data  available  from  colonial  India  are  presented 


at  a  more  aggregated  level,  and  begin  in  a  later  year,  than  the  data  on  all  other  variables  in  equation  (20). 

90I  estimate  these  Cobb-Douglas  weights  as  the  average  (over  trade  blocks  and  years)  expenditure  share  for 
commodity  k,  where  expenditure  is  calculated  as  output  minus  trade. 
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and  minus  one,  respectively^] 


9.2  Results 


Table  10  presents  OLS  estimates  of  equation  (21)  in  order  to  test  Prediction  6  and  shed  light  on 


the  role  for  a  trade-based  mechanism,  such  as  that  highlighted  in  my  model,  to  account  for  the 
reduced-form  impact  of  railroads  on  real  income  levels  and  volatility. 

Column  1  restates  column  2  of  table  9  (discussed  in  the  previous  section)  as  a  point  of  departure. 
This  specification  makes  it  clear  that  there  is  a  large  reduced-form  impact  of  railroads  on  both  the 
level  and  volatility  of  real  agricultural  income  in  the  average  district  in  India.  While  these  results 
could  reflect  the  increased  opportunities  to  trade  that  railroads  brought  about  (an  effect  for  which 
I  found  evidence  in  Step  1),  other  possible  mechanisms  could  also  be  at  work. 


Following  the  strategy  laid  out  in  equation  (21),  column  2  of  table  10  adds  a  variable,  ‘openness’ 


(which  I  compute  in  my  model  using  parameter  estimates  from  Steps  1  and  2),  to  the  regression  in 
column  2.  Consistent  with  prediction  6  of  the  model,  the  coefficients  on  own-railroad  access,  neigh¬ 
boring  districts’  railroad  access,  and  the  interaction  between  own-railroad  access  and  rainfall — all 
of  which  were  statistically  and  economically  significant  in  column  2 — have  all  fallen  to  a  level  that 
is  close  to  zero  (and  whose  95  percent  confidence  intervals  include  zero).  This  is  consistent  with 
the  idea  that  openness  is  a  sufficient  statistic  for  the  impact  of  railroads  on  real  agricultural  income 
(and  its  responsiveness  to  rainfall),  as  predicted  by  the  model. 

In  further  agreement  with  prediction  6,  the  coefficient  on  the  openness  term  is  close  to  minus 
one,  implying  that  openness,  when  measured  in  a  model-consistent  manner,  is  a  strong  determi¬ 
nant  of  real  agricultural  income.  Notably,  the  model  parameters  that  enter  the  openness  term  were 
not  estimated  using  data  that  enters  the  current  estimating  equation,  so  the  impressive  fit  of  the 
openness  term  was  not  preordained.  Finally,  the  last  part  of  prediction  6,  that  the  coefficient  on 
the  rainfall  measure  should  be  one,  is  now  also  corroborated  in  a  statistical  sense;  the  coefficient  on 
rainfall  has  fallen  (when  compared  to  column  1)  to  a  level  that  is  close  to  one. 

Finally,  taking  the  point  estimate  of  0.021  on  own-railroad  access  ( RAILot )  seriously,  implies  that 
only  12  percent  (ie  0.021  divided  by  0.186,  expressed  as  a  percentage)  of  the  total  impact  of  the 
railroads  estimated  in  column  1  cannot  be  explained  by  the  mechanism  of  enhanced  opportunities 
to  trade  according  to  comparative  advantage,  represented  in  the  model.  That  is,  88  percent  of  the 
total  impact  of  the  railroads  on  real  income  in  an  average  district  can  be  explained  by  the  model. 

The  results  in  table  10  establish  a  firm,  quantitative  connection  between  the  earlier  results  in 
this  paper — that  railroads  improved  the  ability  to  trade  within  India  (Steps  1,  2  and  3),  and  that 


91The  computed  openness  term,  7r oOt(0,  RAINt,  Rf ,  L)  is  a  generated  regressor,  so  conventional  standard  errors  ob¬ 
tained  when  using  it  will  be  incorrect.  In  principle,  it  is  possible  to  obtain  correct  standard  errors  by  using  a  bootstrap 
procedure  (as  in  Step  2),  but  this  is  computationally  expensive  here  because  this  is  the  third  step  of  an  estimation  pro¬ 
cedure  (and  hence  a  three-step  bootstrap  procedure  would  be  required).  Adding  to  the  difficulty,  the  first  step  is  non¬ 
linear  and  the  second  step  involves  85  separate  regressions.  I  have  not  calculated  bootstrapped  standard  errors  for  the 
regressions  in  this  section  because  of  the  computation  time  required.  However,  the  empirical  procedure  in  this  section 
is  concerned  primarily  with  the  magnitude  of  point  estimates  rather  than  statistical  inference  about  these  estimates. 
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railroads  raised  real  incomes  (Step  4)  and  reduced  the  volatility  of  real  income  (Step  5)  in  India. 
These  results  suggest  that  the  important  welfare  gains  that  railroads  brought  about  can  be  well 
accounted  for  by  the  specific  mechanism  of  comparative  advantage  based  gains  from  trade. 

10  Conclusion 

This  paper  has  made  three  contributions  to  our  understanding  of  the  effects  of  large  transportation 
infrastructure  projects,  in  the  context  of  an  enormous  expansion  in  transportation  infrastructure — 
the  construction  of  India’s  railroads.  Using  new  district-level  data  that  I  have  collected  from  archival 
sources,  my  first  contribution  is  to  estimate  the  effect  of  India’s  railroads  on  the  trading  environment 
there.  I  find  that  railroads  reduced  the  cost  of  trading,  reduced  inter-regional  price  gaps,  increased 
trade  volumes,  and  brought  India’s  district  economies  close  to  the  small  open  economy  limit  where 
local  prices  are  unresponsive  to  local  productivity  shocks. 

My  second  contribution  is  to  estimate  the  effect  of  India’s  railroads  on  welfare  in  colonial  India. 
I  find  that  that  when  the  railroad  network  was  extended  to  the  average  district,  real  agricultural 
income  in  that  district  rose  by  approximately  18  percent.  I  also  find  that  railroads  reduced  real 
income  volatility.  While  it  is  possible  that  railroads  were  deliberately  allocated  to  districts  on  the 
basis  of  time- varying  characteristics  unobservable  to  economists  today,  I  find  little  evidence  for  this 
potential  source  of  bias  to  my  results  in  placebo,  instrumental  variable,  or  bounds  checks.  These 
reduced-form  hirelings  suggest  that  railroads  brought  welfare  gains  to  colonial  India,  but  say  very 
little  about  the  economic  mechanisms  behind  these  gains. 

Finally,  my  third  contribution  is  to  shed  light  on  the  mechanisms  at  work  by  relating  the  ob¬ 
served  railroad-driven  reduction  in  trade  costs  to  the  observed  railroad-driven  increase  in  welfare. 
To  do  so  requires  a  calibrated,  general  equilibrium  model  of  trade  with  many  regions,  many  goods, 
and  unrestricted  trade  costs.  I  extend  the  work  of  Eaton  and  Kortum  (2002)  to  construct  such  a 
model  and  estimate  its  unknown  parameters  using  auxiliary  model  equations.  The  model  identifies 
a  sufficient  statistic  for  the  effect  of  trade  cost  reductions  on  welfare,  which  accounts  empirically  for 
virtually  all  of  the  observed  effects  of  railroads.  This  suggests  that  railroads  raised  welfare  in  India 
primarily  because  they  reduced  the  cost  of  trading,  and  enabled  districts  to  enjoy  more  of  the  gains 
from  trade  due  to  comparative  advantage. 

One  limitation  of  the  present  study  is  its  focus,  due  to  data  constraints,  on  the  real  income  of  a 
representative  agent  in  each  district.  This  focus  removes  the  possibility  that  a  trade  cost  reduction 
could  reduce  the  real  returns  to  some  factors  of  production  (or  leave  their  real  incomes  more  exposed 
to  climatic  variation).  Such  distributional  concerns  feature  prominently  in  modern  theories  of  the 
causes  of  famines,  such  as  Sen  (1981).  Large  transportation  infrastructure  improvements,  such  as 
India’s  railroads,  offer  the  chance  to  probe  empirically  these  distributional  concerns,  and  to  test 
economic  models  of  famine. 
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A  Data  Appendix 

This  appendix  provides  information  (supplementary  to  that  in  section  [2])  on  the  variables  used  in 
this  paper. 


Sample  of  Districts: 

The  data  I  use  in  this  paper  cover  the  areas  of  modern-day  India,  Pakistan  and  Bangladesh,  most 
of  the  area  known  as  British  India.  I  work  with  a  panel  of  239  geographic  units  of  analysis  that  I 
refer  to  as  districts,  for  as  much  of  the  period  1861  to  1930  as  possible]^] 


Trade  Cost  Proxy  Variables: 

I  construct  trade  cost  proxy  variables  using  a  newly  constructed  GIS  database  on  the  Indian  trans¬ 
portation  network,  from  1851  to  1930.  The  database  covers  four  modes  of  transportation:  railroads, 
roads,  rivers  and  coastal  shipping J^j  Each  segment  (approximately  20  km  long)  of  the  railroad  net¬ 
work  is  coded  according  to  the  year  in  which  it  was  opened]^]  For  river  transport  I  keep  only  those 
rivers  that  are  reported  in  Schwartzberg  (1978)  or  Bourne  (1849)  as  navigable  in  1857.  The  final 
component  of  the  colonial  India  GIS  database  that  I  construct  is  the  location  of  each  district  and 
salt  source.  To  calculate  district  locations  I  digitize  a  map  of  the  district  borders  in  India  (as  they 
existed  in  1891)  I  use  this  to  calculate  district  centroids,  which  I  take  to  be  the  ‘location’  of  each 
district.  Finally,  I  obtain  the  location  of  each  salt  source  from  contemporary  maps. 

I  then  convert  the  GIS  database  of  transportation  lines  and  district /salt  source  locations  into  a 
graph  of  nodes  and  arcs,  as  is  common  in  the  transportation  literature  (Black  2003).  I  work  with  a 
simplified  graph  representation  of  the  Indian  transportation  network,  where  the  number  of  nodes  and 
the  sparsity  of  arcs  is  low  enough  for  network  algorithms  to  be  feasibly  operated  on  it  using  a  desktop 
computer  (the  resulting  network  has  7651  nodes)  p]  Because  the  density  of  informal  roads  was  ex¬ 
tremely  high  (Deloche  1994),  I  allow  road  transport  to  occur  along  the  straight  line  between  any  two 
nodes  on  the  network,  but  only  if  the  two  nodes  either  represent  districts  or  salt  sources,  or  the  two 
nodes  are  within  1000  km  of  each  other  9‘  The  result  is  a  network  with  7651  nodes,  5616  of  which  rep- 


92The  majority  of  British  India  was  under  direct  British  control,  and  was  divided  into  nine  large,  administrative 
units  known  as  provinces.  Each  province  was  further  sub-divided  into  a  total  of  223  districts,  which  are  the  units  of 
analysis  that  I  track  from  1861  to  1930.  Areas  not  under  direct  British  control  were  known  as  ‘Princely  States’.  For 
administrative  purposes  these  were  grouped  into  divisions  similar  to  the  provinces  and  districts  described  above,  so 
in  princely  state  areas  I  use  the  lower  administrative  units  as  my  units  of  analysis  and  refer  to  them  as  districts, 
following  the  Indian  Administrative  Atlas  (Singh  and  Banthia  2004).  There  were  251  of  these  districts  in  my  sample 
area,  but  data  collection  in  the  princely  states  was  extremely  incomplete  and  I  include  only  16  districts  from  the 
princely  state  regions  in  my  final  sample. 

93To  construct  this  database,  I  begin  with  a  GIS  database  that  contains  the  locations  of  contemporary  railroad, 
river  and  coast  lines  from  the  Digital  Chart  of  the  World. 

94To  do  this  I  use  the  publication  History  of  Indian  Railways,  Constructed  and  in  Progress  (1918  and  1966 
volumes),  the  1966  volume  of  which  refers  to  railway  lines  in  modern-day  India  only.  To  obtain  years  of  opening 
for  line  segments  in  modern-day  Pakistan  and  Bangladesh  from  1919  to  1930  I  use  the  annual  Railway  Reports 
published  by  the  Railways  Department,  which  list  all  line  section  openings  in  each  year. 

95I  use  the  maps  in  the  Indian  Administrative  Atlas  and  Constable ’s  Hand  Atlas  of  India  (Bartholomew  1893)  to 
create  this  digital  map. 

96To  do  this,  I  use  the  ‘simplify’  command  in  ArcGIS.  A  line  in  ArcGIS  is  a  series  of  vertices  connected  by  straight 
lines.  The  ‘simplify’  command  removes  vertices  in  such  a  way  as  to  minimize  the  sum  of  squared  distances  between 
the  original  line  and  a  the  simplified  line.  The  original  Digital  Chart  of  the  World  railway  layer,  for  example, 
consists  of  approximately  33,000  vertices;  I  simplify  the  railway  layer  to  one  of  only  5616  vertices. 

9 ‘Allowing  straight-line  road  travel  between  any  two  nodes  would  yield  a  network  with  over  58  million  arcs.  The 
shortest  path  between  each  of  the  nodes  on  such  a  dense  network  cannot  be  calculated  using  a  desktop  computer, 
so  I  restrict  many  of  these  arcs  to  be  non-existent;  the  result  is  that  the  7651-by-7651  matrix  representing  the 
network  can  be  stored  as  a  sparse  matrix,  and  analyzed  using  sparse  matrix  routines  (that  increase  computation 
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resent  the  railroad  network,  660  of  which  represent  the  navigable  river  network,  890  of  which  repre¬ 
sent  coastal  shipping  routes,  477  of  which  represent  the  centroids  of  the  477  districts  in  India  (in  1891 
borders),  and  8  of  which  represent  the  locations  of  the  sources  of  8  different  types  of  salt.  Because  the 
railroad  arcs  are  coded  with  a  year  of  opening  indicator,  this  network  can  be  restricted  to  represent 
the  transportation  network  for  any  year  from  1851  to  1930  by  simply  turning  these  arcs  on  or  off. 

Finally,  I  use  this  network  representation  of  the  Indian  transportation  system  to  calculate  the 
two  trade  cost  proxy  variables  described  in  section  [IJ  One  such  proxy  is  a  measure  of  the  cost  of 
traveling  between  any  two  points  (where  a  point  is  either  a  district  or  a  salt  source)  in  a  year  using 
the  lowest-cost  route  along  the  network  (available  in  that  year).  The  lowest-cost  route  depends  on 
the  value  of  the  relative  freight  rates,  ot ,  and  the  available  network,  Ht.  Conditional  on  values  of  a , 
I  use  a  standard  algorithm  from  graph  theory  and  transportation  science  (Dijkstra’s  algorithm)  to 
calculate  the  shortest  path  between  every  pair  of  points,  along  the  transportation  network  available 
in  each  year  from  1861  to  1930.  The  resulting  measure,  LCR0dt((*-,  Rf),  is  in  units  of  railroad- 
equivalent  kilometers.  The  second  proxy  variable  for  trade  costs  is  a  dummy  variable  that  indicates 
when  it  is  possible  to  travel  between  two  districts  without  leaving  the  railroad  network.  This  is  eas¬ 
ily  constructed  using  the  transportation  network  representation  and  digital  map  of  Indian  districts 
described  above. 


Bilateral  Trade  Flows: 

The  data  I  use  on  bilateral  trade  flows  was  collected  from  a  variety  of  different  sources,  one  for  each 
mode  of  transportation.  I  describe  each  of  these  modes  in  turn,  and  then  how  they  were  combined 
into  aggregate  data  on  trade  flows. 

Data  on  railroad  trade  within  India  were  published  separately  for  each  province.  The  geographic 
unit  of  analysis  in  these  records  is  the  ‘trade  block’,  which  spans  between  four  and  five  districts^ 


The  railroad  trade  flow  data,  like  that  on  all  modes  of  transportation  described  below,  represents 
final  shipments  between  two  regions  (even  if  a  shipment  changed  railroad  companies)  p*|  Only 
if  a  shipment  was  taken  off  the  railroad  system  and  re-shipped  onwards  would  it  be  counted  as 
two  separate  shipments.  I  collccte  this  data  from  various  annual,  provincial  publications  from  1880 
onwards. 100  Data  on  river-borne  trade  within  India  was  published  in  a  similar  manner  to  the  railroad 


trade  data,  for  the  Brahmaputra,  Ganges  and  Indus  river  systems.  I  collecte  the  river-borne  trade 
data  from  the  railroad  trade  statistics  publications  for  the  provinces  of  Assam,  Bengal,  Northwestern 
Provinces,  and  Sind.  Data  on  trade  within  India  that  occurred  via  coastal  shipping  was  published  by 


speed  dramatically)  in  Matlab. 

98Trade  blocks  split  into  smaller  blocks  over  time,  but  I  aggregate  over  these  splits  to  maintain  constant  geographic 
units.  The  trade  blocks  were  always  drawn  so  as  to  include  whole  numbers  of  districts. 

"All  bilateral  block-to-block  intra-provincial  trade  flows  were  published,  except  that  from  a  block  to  itself  (which 
was  always  unreported).  Inter-provincial  trade  flows  were  published  from  each  internal  block  to  each  external 
province  (and  vice  versa),  but  not  by  trade  block  within  the  external  province.  I  therefore  create  a  full  set  of 
inter-provincial  block-to-block  flows  by  following  an  analogous  procedure  to  that  used  to  prepare  bilateral  trade 
data  on  provincial-state  trade  between  Canada  and  the  United  States  in  McCallum  (1995)  and  Anderson  and  van 
Wincoop  (2003).  This  method  assigns  a  province’s  trade  block’s  imports  from  each  of  another  province’s  trade  blocks 
in  proportion  to  the  exporting  blocks’  stated  exports  to  the  entire  importing  province  (and  vice  versa  for  exports). 
In  order  to  match  the  internal  block-level  railroad  trade  data  to  international  trade  data  (leaving  via  specified  ports, 
as  described  below),  I  apply  a  similar  proportionality  method.  This  is  possible  because  the  railroad  trade  data 
differentiate  railroad  trade  to/from  principal  ports  (in  each  province)  from  trade  bound  for  non-port  consumption. 

100The  titles  of  these  publications  changed  over  time,  from  Returns  of  the  Rail  [and  River-borne]  Trade  of  [Province] 
to  Report  on  the  trade  carried  by  rail  [and  river]  in  [Province]  to  Report  on  Inland  Trade  of  [Province] .  In  the 
province  of  Madras,  these  statistics  were  only  published  from  1909  onwards.  Railroad  trade  statistics  were  not 
published  by  the  princely  states  themselves,  but  each  province’s  external  trade  to/fronr  each  of  the  large  princely 
states  were  published.  I  therefore  treat  each  large  princely  state  (Central  India  Agency,  Hyderabad,  Mysore, 
Rajputana  and  Travancore)  as  a  single  trade  block. 
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each  of  the  coastal  provinces  (Bengal,  Bombay,  Madras  and  Sind)  in  a  similar  manner  to  the  railroad 
trade  data.  I  collected  the  coastal  trade  data  from  various  annual,  provincial  publications]101! 

Data  on  international  trade  leaving  India  was  published  separately  for  trade  by  maritime  ship¬ 
ping  and  by  roads.  Each  province  published  its  own  maritime  international  trade  statistics,  with 
each  reporting  the  trade  to  and  from  its  major  and  minor  portsj10^]  This  international  maritime 
trade  data  was  presented  disaggregated  into  over  30  foreign  countries,  but  to  maintain  consistent 
geographic  units  over  time  I  aggregate  these  30  countries  into  24  foreign  regions.  Foreign  trade 
by  land  occurred  (in  extremely  small  volumes)  between  Bengal,  Northwest  Provinces  and  Punjab 
provinces  and  neighboring  foreign  countries  (modern-day  Nepal,  China,  Afghanistan  and  Bhutan). 
This  trade  data  was  published  by  each  of  these  provinces,  disaggregated  by  the  border  post  through 
which  trade  left  or  arrived.  I  collect  this  data  from  various  annual,  provincial  publications. 


103 


ties 


Trade  data  by  all  modes  of  transport  discussed  above  was  published  disaggregated  by  commodi- 
101  In  order  to  compare  commodities  across  these  different  levels  of  aggregation,  I  aggregate  all 
data  to  the  85-commodity  level. 105  Finally,  I  aggregate  the  trade  data  on  each  of  the  modes  (for 
each  commodity  separately)  into  one  trade  dataset,106  All  of  the  above  trade  data  are  available  from 
(at  least)  1861  to  1930  (and  beyond),  except  for  the  railroad  trade  data.  The  railroad  trade  data 
only  starts  in  a  coherent  manner  in  1880,  and  was  discontinued  in  1920.  I  therefore  use  bilateral 
trade  data  from  1880  to  1920  only. 


Rainfall  Data: 

A  thick  network  of  3614  rain  gauges  at  metorological  stations  (illustrated  in  Figure  1)  recorded 
daily  rainfall  amounts  from  1891-1930.  From  1901  onwards,  these  records  have  been  digitized  by 
the  Global  Historical  Climatology  Network  (Daily)  project;  the  GHCN  dataset  also  provides  the 
latitude  and  longitude  of  each  station.  For  the  years  1891-1900,  I  collect  the  data  from  the  pub¬ 
lication  Daily  Rainfall  for  India  in  the  year....  In  the  years  1865  to  1890,  very  little  daily  rainfall 
data  was  published  in  colonial  India,  but  monthly  data  from  365  stations  (spread  throughout  India) 
was  published  by  each  province,10'  I  convert  monthly  station-level  data  to  daily  station-level  data 


using  a  procedure  that  is  common  in  the  meteorological  statistics  literature  (eg,  Ngo-Duc,  Polcher, 
and  Laval  (2005))j1°*|  I  convert  station-level  data  to  district- level  data  by  simply  averaging  over  the 


101  The  coastal  trade  data  were  published  in  publications  whose  titles  changed  from  Annual  Statement  of  the 
Sea-borne  Trade  and  Navigation  of  [Province]  to  Report  on  the  Maritime  Trade  of  [Province]. 

102The  maritime  international  trade  data  was  published  in  the  same  publications  as  those  containing  the  coastal 
trade  data,  described  above. 

103The  overland  international  trade  data  was  published  in:  Annual  Report  on  the  Trans-frontier  Trade  of  Bihar  and 
Orissa  with  Nepal ,  Bengal  Frontier  Trade:  Trade  of  Bengal  with  Nepal ,  Tibet,  Sikkim  and  Bhutan ,  Accounts  Relating 
to  the  Trade  by  Land  of  British  India  with  Foreign  Countries ,  Annual  Report  on  the  Foreign  Trade  of  the  United 
Provinces ,  and  Report  on  the  External  Land  Trade  of  the  Punjab.  I  assign  each  of  these  border  posts  to  the  internal 
trade  block  in  which  it  is  located,  and  assume  that  all  of  the  foreign  land  trade  came  to/from  these  blocks  only. 

104The  railroad  and  river-borne  trade  data  reported  85-100  commodities  (depending  on  the  year  and  province),  the 
coastal  shipping  data  200-400  commodities,  and  the  international  maritime  shipping  data  over  400  commodities. 

105I  use  the  commodity  classification  in  the  international  maritime  shipping  publications  (used  to  organize  the  over 
400  commodities  in  these  publications)  to  do  this. 

106Wherever  relevant,  I  treat  the  regions  of  modern-day  Afghanistan,  Myanmar  and  Sri  Lanka  as  foreign  countries, 
since  they  are  outside  of  the  region  on  which  I  have  other  data  from  India. 

10 'These  publications  included  the  Administration  Reports  for  each  province,  described  in  the  agricultural  price 
data  section  below.  I  use  additional  data  (to  increase  the  number  of  stations)  that  was  published  in  selected 
provinces’  Sanitary  Reports. 

108Using  daily  data  from  1891  to  1930,  I  estimate  the  district-specific  relationship  between  the  pattern  of  monthly 
rainfall  in  a  year  and  the  rainfall  on  any  day  of  that  year;  I  then  use  these  estimated  relationships  to  predict  the 
rainfall  on  any  day  in  a  given  district  and  year  from  1865  to  1890,  conditional  on  the  pattern  of  monthly  rainfall 
actually  observed  in  that  district  and  year.  While  these  daily  rainfall  predictions  are  likely  to  be  imprecise,  much 
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many  stations  in  each  district 


109 


Prices  of  Salt  and  Agricultural  Commodities: 

I  use  data  on  eight  different  types  of  salt|110|  for  each  of  the  six  provinces  in  Northern  India.  And 
I  use  data  on  17  agricultural  commodities111!  from  all  of  India.  I  collect  this  price  data  from  var¬ 
ious  annual,  provincial  publications]112!  Prices  reported  in  these  publication  were  an  average  of 
observations  taken  by  district  officers  once  per  fortnight  at  each  of  10-15  leading  retail  markets  per 
district. 


Real  Agricultural  Income: 

I  use  data  that  present  the  area  under  each  of  17  crops  (the  17  for  which  price  data  are  available), 
and  the  yield  per  acre  for  each  of  these  crops,  in  each  district  and  year,113  I  take  the  product  of 
each  area  and  yield  pair  to  create  a  measure  of  real  output  for  each  crop,  district  and  year.  I  then 
evaluate  this  bundle  of  17  real  output  measures  at  the  prices  prevailing  for  these  crops  (from  the 
agricultural  price  data  described  above),  in  each  district  and  year,  to  create  a  measure  of  total  nom¬ 
inal  agricultural  output  for  each  district  and  year.  Finally,  I  divide  nominal  output  by  a  consumer 
price  index  (the  Fisher  ideal  index)  to  create  a  measure  of  real  income,111' 


of  the  imprecision  is  averaged  over  when  I  construct  crop-specific  rainfall  shocks,  which  are  measures  of  the  total 
rainfall  in  a  given  period  (a  length  ranging  from  55  to  123  days.) 

109If  a  given  district-day  has  no  reported  rainfall  observations  I  impute  this  missing  observation  by  using  an  inverse 
distance-weighted  average  of  that  day’s  rainfall  in  the  5  closest  reporting  stations  (know  as  “Shepard’s  method”  in 
the  meteorological  literature  (Shepard  1968). 

110These  eight  salt  types  are  those  from:  the  Bombay  sea  salt  sources  near  the  city  of  Bombay,  salt  from  the  UK 
distributed  via  Calcutta,  the  Didwana  salt  source  in  Punjab,  the  Kohat  mines  in  Punjab  (principally  the  Jatta  mine, 
according  to  Watt  (1889)),  the  Mandi  mine  in  Punjab,  the  Salt  Range  mines  in  Punjab  (principally  the  Mayo  mine,  ac¬ 
cording  to  Watt  (1889)),  the  Sambhar  Salt  Lake  in  Rajputana,  and  the  Sultanpur  source  in  the  Central  India  Agency. 

mThese  crops  are:  bajra,  barley,  bengal  gram,  cotton,  indigo,  jowar,  kangni,  linseed,  maize,  opium,  ragi,  rape  and 
mustard  seed,  rice,  sesamum,  sugarcane,  tur  and  wheat. 

112These  publications  were:  Prices  and  Wages  in  India ;  Administration  Reports  from  all  provinces;  the  Salt  Report 
of  Northern  India ;  the  Statistical  Atlas  of  Andhra  State  with  agricultural  price  data  (for  the  Madras  Presidency); 
the  Season  and  Crop  Reports  from  various  provinces  with  agricultural  price  data;  and  the  Sanitary  Reports  from 
various  provinces  with  data  on  prices  of  food  grains. 

113These  data  were  published  in  Agricultural  Statistics  of  India  from  1884  to  1930.  For  the  years  1870-1883  I 
use  data  on  crop  areas  and  yields  in  the  provincial  Administration  Reports ,  as  described  in  the  agricultural  prices 
data  section  above.  Data  on  agricultural  output  was  published  in  each  province’s  Administration  Report  except  for 
Punjab  and  Bengal.  For  supplementary  data  I  use  each  province’s  Season  and  Crops  Report  between  1904  and  1930. 
While  Blyn  (1966)  and  Heston  (1973)  have  discussed  the  potential  for  measurement  error  in  these  sources,  these 
authors  have  not  been  concerned  with  mechanisms  through  which  measurement  error  might  be  correlated  with  the 
regressors  I  use  in  this  paper. 

114In  order  to  compute  this  consumer  price  index  I  use  district  and  year  specific  consumption  weights  from  the 
internal  trade  data,  computing  consumption  as  output  minus  net  exports. 
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Figure  1:  Meteorological  stations  in  colonial  India:  Dots  represent  the  3914 
meteorological  stations  with  rain  gauges  collecting  daily  rainfall  in  the  period  from  1891- 
1930.  District  borders  are  also  shown.  Source:  Global  Historical  Climatological  Network 
and  author's  calculations;  see  Appendix  A  for  full  details. 


Figure  2:  The  evolution  of  India's  railroad  network,  1860-1930:  These  figures 
display  the  decadal  evolution  of  the  railroad  network  (railroads  depicted  with  thick  lines)  in 
colonial  India  (the  outline  of  which  is  depicted  with  thin  lines).  The  first  railroad  lines  were  laid 
in  1853.  This  figure  is  based  on  a  GIS  database  in  which  each  20  km  railroad  segment  is  coded 
with  a  year  of  opening  variable.  Source:  Author's  calculations  based  on  official  publications. 
See  Appendix  A  for  details. 


Table  1 :  Descriptive  Statistics 


Number  of  Beginning  of  End  of 

Observations  Available  Data  Available  Data 


Value  of  agricultural  output  per  acre,  all  crops  (current  rupees) 

14,340 

27.3 

111.3 

(10.4) 

(40.8) 

Agricultural  prices  (average  over  all  crops,  current  rupees 

14,340 

2.37 

5.07 

per  maund) 

(1.37) 

(2.35) 

Real  agricultural  income  per  acre  (1870  rupees) 

14,340 

27.3 

38.0 

(10.4) 

(13.8) 

Real  agricultural  income  per  acre,  coefficient  of  variation  over 

13,384 

0.06 

0.04 

past  5  years 

(0.04) 

(0.03) 

Price  of  salt,  all  sources  (current  rupees  per  maund) 

7,329 

5.19 

3.45 

(1.96) 

(0.465) 

Total  annual  rainfall  (meters) 

14,340 

1.011 

1.145 

(0.798) 

(1.302) 

Crop-specific  rainfall  shock  (meters) 

73,000 

0.638 

0.662 

(0.614) 

(0.602) 

Crop-specific  rainfall  shock,  coefficient  of  variation  over  past 

68,821 

0.520 

0.541 

5  years 

(0.417) 

(0.480) 

Exports  per  trade  block  (millions  of  1 870  rupees) 

6,581,327 

0.711 

3.581 

(0.649) 

(2.444) 

Notes:  Values  are  sample  means  over  all  observations  for  the  year  and  question,  with  standard  deviations  in  parentheses.  Beginning  and 
end  of  available  data  are:  1870  and  1930  for  agricultural  output  and  real  agricultural  income;  1861  and  1930  for  agricultural  prices  and  salt 
prices;  1 867  and  1 930  for  all  rainfall  variables;  and  1 880  and  1 920  for  trade  data.  A  'maund'  is  equal  to  37.3  kg  and  was  the  standardized  unit 
of  weight  in  colonial  India.  Data  sources  and  construction  are  described  in  full  in  Appendix  A. 


Table  2:  Railroads  and  Trade  Costs  (Step  1) 


Dependent  variable:  Log  salt  price  at  destination 

(1) 

(2) 

(3) 

(4) 

(5) 

Source  connected  to  destination  by  railroad 

-0.112 

0.023 

(0.046)*** 

(0.342) 

Log  distance  to  source,  along  lowest-cost  route 

0.135 

(at  historical  freight  rates) 

(0.038)*** 

Log  distance  to  source,  along  lowest-cost  route 

0.255 

0.247 

0.233 

(at  estimated  mode  costs) 

(0.059)**' 

*  (0.063)*** 

(0.074)*** 

Estimated  mode  costs: 

Railroad  (normalised  to  1) 

1 

1 

1 

N/A 

N/A 

N/A 

Road 

7.341 

7.880 

7.711 

(1.687)**' 

"  (1.913)*** 

(2.032)*** 

River 

3.396 

3.821 

4.118 

(0.760)**' 

"  (1.034)*** 

(1.381)*** 

Coast 

3.213 

3.942 

3.612 

(1.893)* 

(2.581) 

(2.674) 

Salt  type  x  Year  fixed  effects 

YES 

YES 

YES 

YES 

YES 

Salt  type  x  Destination  district  fixed  effects 

YES 

YES 

YES 

YES 

YES 

Salt  type  x  Destination  district  trends 

YES 

YES 

NO 

YES 

YES 

Observations 

7329 

7329 

7329 

7329 

7329 

R-squared 

0.841 

0.960 

0.953 

0.974 

0.975 

Notes :  Regressions  estimating  equation  (11)  using  data  on  8  types  of  salt  (listed  in  Appendix  A),  from  124  districts  in  5  Northern  Indian 
provinces  (listed  in  Appendix  A),  annually  from  1861  to  1930.  Columns  1  and  2  are  OLS  regressions,  and  columns  3-5  are  NLS  regressions 
(with  block-bootstrapped  standard  errors).  'Source  connected  to  destination  by  railroad'  is  a  dummy  variable  equal  to  one  in  all  years  when  it 
is  possible  to  travel  by  railroad  from  any  point  in  the  district  containing  the  salt  source  to  any  point  in  the  destination  district.  The  'distance  to 
source,  along  lowest-cost  route'  variable  is  a  measure  of  the  railroad-equivalent  kilometres  (because  railroad  freight  rate  is  normalized  to  1) 
between  the  salt  source  and  the  destination  district,  along  the  lowest-cost  route  given  relative  mode  costs  per  unit  distance.  'Historical  freight 
rates'  used  are  4.5,  3.0  and  2.25  respectively  for  road,  river  and  coastal  mode  costs  per  unit  distance,  all  relative  to  rail  transport.  Data 
sources  and  construction  are  described  in  full  in  Appendix  A.  Heteroskedasticity-robust  standard  errors  corrected  for  clustering  at  the 
destination  district  level  are  reported  in  parentheses.  ***  indicates  statistically  significantly  different  from  zero  at  the  1%  level;  **  indicates  5% 
level;  and  *  indicates  10%  level. 

Table  3:  Railroads  and  Trade  Flows  (Step  2) 


Dependent  variable:  Log  value  of  exports 

(1) 

(2) 

(3) 

Fraction  of  origin  and  destination  districts  connected  by  railroad 

1.482 

(0.395)*** 

Log  distance  beween  origin  and  destination  along  lowest-cost  route 

-1.303 

(0.210)*** 

-1.284 

(0.441)*** 

(Log  distance  beween  origin  and  destination  along  lowest-cost  route) 
x  (Weight  per  unit  value  of  commodity  in  1 880) 

-0.054 

(0.048) 

(Log  distance  beween  origin  and  destination  along  lowest-cost  route) 
x  (Fligh-value  railroad  freight  class  of  commodity  in  1 880) 

0.031 

(0.056) 

Origin  trade  block  x  Year  x  Commodity  fixed  effects 

YES 

YES 

YES 

Destination  trade  block  x  Year  x  Commodity  fixed  effects 

YES 

YES 

YES 

Origin  trade  block  x  Destination  trade  block  x  Commodity  fixed  effects 

YES 

YES 

YES 

Origin  trade  block  x  Destination  trade  block  x  Commodity  trends 

YES 

YES 

YES 

Observations 

6,581,327 

6,581,327 

6,581,327 

R-squared 

0.943 

0.963 

0.964 

Notes:  Regressions  estimating  equations  (13)  and  (14),  using  data  on  85  commodities,  45  trade  blocks,  and  23  foreign  countries,  annually 
from  1880  to  1920.  'Fraction  of  origin  and  destination  districts  connected  by  railroad'  is  the  share  of  the  district  pairs  between  trade  block  o 
and  traded  block  d  that  for  which  it  is  possible  to  travel  entirely  by  railroad  from  any  point  in  one  district  to  any  point  in  the  other  district.  The 
'distance  between  origin  and  destination  along  lowest-cost  route'  variable  is  a  measure  of  the  railroad-equivalent  kilometres  (due  to  the 
normalized  railroad  freight  rate  to  1)  between  the  centroid  of  the  origin  and  destination  trade  blocks  in  question,  along  the  lowest-cost  route 
given  relative  freight  rates  for  each  mode  of  transport  (as  estimated  in  table  2).  'Weight  per  unit  value  in  1880'  is  the  weight  (in  maunds)  per 
rupee,  as  measured  by  1880  prices.  'Railroad  freight  class  in  1880'  is  an  indicator  variable  for  all  commodities  that  were  classified  in  the 
higher  (more  expensive)  freight  class  in  1880;  salt  was  in  the  omitted  category  (low-value  commodities).  Data  sources  and  construction  are 
described  in  full  in  Appendix  A.  Standard  errors  are  reported  in  parentheses.  In  column  1  these  are  heteroskedasticity  robust  standard 
errors  adjusted  for  clustering  at  the  exporting  block  level.  In  columns  2-3  these  are  bootstrapped  standard  errors  (using  a  two-stage  block 
bootstrap  at  the  exporting  block  level)  to  correct  for  the  generated  regressor.  ***  indicates  statistically  significantly  different  from  zero  at  the 
1%  level;  **  indicates  5%  level;  and  *  indicates  10%  level. 


Table  4:  Railroads  and  Price  Responsiveness  (Step  3) 


Dependent  variable:  Log  price 

(1) 

(2) 

(3) 

(4) 

Local  rainfall  in  sowing  and  growing  periods 

-0.256 

(0.102)** 

-0.428 

(0.184)*** 

-0.215 

(0.105)** 

-0.402 

(0.125)*** 

(Local  rainfall)  x  (Railroad  in  district) 

0.414 

(0.195)** 

0.375 

(0.184)** 

Rainfall  in  neighboring  districts 

-0.051 

(0.023)** 

-0.021 

(0.018) 

Average:  (Rainfall  in  neighboring  districts)  x 

(Connected  to  neighboring  districts  by  railroad) 

-0.082 

(0.036)*** 

Crop  x  Year  fixed  effects 

YES 

YES 

YES 

YES 

Crop  x  District  fixed  effects 

YES 

YES 

YES 

YES 

District  x  Year  fixed  effects 

YES 

YES 

YES 

YES 

Observations 

73,000 

73,000 

73,000 

73,000 

R-squared 

0.891 

0.892 

0.894 

0.899 

Notes :  OLS  Regressions  estimating  equation  (16)  using  data  on  17  agricultural  crops  (listed  in  Appendix  A),  from  239  districts  in  India, 
annualy  from  1861  to  1930.  'Local  rainfall  in  sowing  and  growing  period’  (abrev.  'local  rainfall')  refers  to  the  amount  of  rainfall  (measured  in 
meters)  in  the  district  in  question  that  fell  during  crop-  and  district-  specific  sowing  and  harvesting  dates.  'Railroad  in  district’  is  a  dummy 
variable  whose  value  is  one  if  any  part  of  the  district  in  question  is  penetrated  by  a  railroad  line.  'Rainfall  in  neighboring  districts'  is  the 
variable  'local  rainfall'  averaged  over  all  districts  within  a  250  km  radius  of  the  district  in  question.  'Connected  to  neighboring  district'  is  a 
dummy  variable  that  is  equal  to  one  if  the  district  in  question  is  connected  by  a  railroad  line  to  each  neighboring  district  within  250  km.  Data 
sources  and  construction  are  described  in  full  in  Appendix  A.  Heteroskedasticity-robust  standard  errors  corrected  for  clustering  at  the  district 
level  are  reported  in  parentheses.  ***  indicates  statistically  significantly  different  from  zero  at  the  1%  level;  **  indicates  5%  level;  and  * 
indicates  10%  level. 

Table  5:  Railroads  and  Real  Income  Levels  (Step  4)  -  OLS  Estimates 


Dependent  variable:  Log  real  agricultural  income  per  acre 

(1) 

(2) 

Railroad  in  district 

0.164 

0.182 

(0.056)*** 

(0.071)** 

Railroad  in  neighboring  district 

-0.042 

(0.020)** 

District  fixed  effects 

YES 

YES 

Year  fixed  effects 

YES 

YES 

Observations 

14,340 

14,340 

R-squared 

0.744 

0.758 

Notes :  OLS  Regressions  estimating  equation  (18)  using  real  income  constructed  from  crop-level  data  on  17  principal  agricultural  crops 
(listed  in  Appendix  A),  from  239  districts  in  India,  annualy  from  1870  to  1930.  'Railroad  in  district'  is  a  dummy  variable  whose  value  is  one  if 
any  part  of  the  district  in  question  is  penetrated  by  a  railroad  line.  'Railroad  in  neighboring  districts'  is  the  variable  'railroad  in  district' 
averaged  over  all  districts  within  a  250  km  radius  of  the  district  of  observation.  Data  sources  and  construction  are  described  in  full  in 
Appendix  A.  Heteroskedasticity-robust  standard  errors  corrected  for  clustering  at  the  district  level  are  reported  in  parentheses.  ***  indicates 
statistically  significantly  different  from  zero  at  the  1%  level;  **  indicates  5%  level;  and  *  indicates  10%  level. 


Table  6:  Railroads  and  Real  Income  Levels  (Step  4)  -  Placebo  Specifications 


Dependent  variable:  log  real  agricultural  income  per  acre  (1)  (2)  (3)  (4) 

Railroad  in  district  0.172  0.190  0.167  0.188 

(0.099)  (0.099)*  (0.083)**  (0.075)** 

Railroad  in  neighboring  district  -0.031  -0.028  -0.058  -0.047 

(0.039)  (0.031)  (0.029)**  (0.034) 

Unbuilt  railroad  in  district,  abandoned  after  proposal  stage  0.008 

(0.021) 

Unbuilt  railroad  in  district,  abandoned  after  reconnoitering  stage  -0.002 

(0.048) 

Unbuilt  railroad  in  district,  abandoned  after  survey  stage  0.014 

(0.038) 

Unbuilt  railroad  in  district,  abandoned  after  sanction  stage  0.010 

(0.082) 


(Unbuilt  railroad  in  district,  included  in  Lawrence  Plan  1869-1873)  0.010 

x  (post-1869  indicator)  (0.056) 

(Unbuilt  railroad  in  district,  included  in  Lawrence  Plan  1874-1878)  -0.054 

x  (post-1874  indicator)  (0.067) 

(Unbuilt  railroad  in  district,  included  in  Lawrence  Plan  1879-1883)  0.008 

x  (post-1879  indicator)  (0.051) 


(Unbuilt  railroad  in  district,  included  in  Lawrence  Plan  1884-1888) 
x  (post-1884  indicator) 

(Unbuilt  railroad  in  district,  included  in  Lawrence  Plan  1889-1893) 
x  (post-1889  indicator) 

(Unbuilt  railroad  in  district,  included  in  Lawrence  Plan  1894-1898) 
x  (post-1894  indicator) 

(Unbuilt  railroad  in  district,  in  Bombay  Chamber  of  Commerce  plans) 
x  (post-1883  indicator) 

(Unbuilt  railroad  in  district,  in  Madras  Chamber  of  Commerce  plans) 
x  (post-1883  indicator) 

(Unbuilt  railroad  in  district,  included  in  Kennedy  plan,  high-priority) 
x  (year- 1848) 

(Unbuilt  railroad  in  district,  included  in  Kennedy  plan,  low-priority) 
x  (year- 1848) 


0.068 

(0.104) 

-0.092 

(0.087) 

0.041 

(0.058) 

0.003 

(0.033) 

-0.063 

(0.098) 


0.0005 

(0.038) 

-0.001 

(0.026) 


District  fixed  effects 

YES 

YES 

YES 

YES 

Year  fixed  effects 

YES 

YES 

YES 

YES 

Observations 

14,340 

14,340 

14,340 

14,340 

R-squared 

0.769 

0.769 

0.770 

0.770 

Notes  -.  OLS  regressions  similar  to  those  in  Table  5.  'Railroad  in  district'  and  ’Railroad  in  neighboring  districts'  are  defined  in  the  notes  to  Table  5. 
'Unbuilt  railroad  in  district,  abandoned  after  X  stage'  is  a  dummy  variable  whose  value  is  one  if  a  line  that  was  abandoned  after  'X'  stage  penetrates 
a  district,  in  all  years  after  then  line  was  first  mentioned  as  reaching  stage  'X'  in  official  documents.  Stages  ’X'  are:  'proposal',  where  line  was 
mentioned  in  official  documents;  'reconnoitering',  where  line  route  was  explored  by  surveyors  in  rough  detail;  'survey',  where  the  exact  route  of  the 
line  and  nature  of  all  engineering  works  were  decided  on  after  detailed  survey;  and  'sanction',  where  the  surveyed  line  was  given  official  permission 
to  be  built.  'Lawrence  1868  plan’  was  a  proposal  for  significant  railroad  expansion  by  India's  Governor  General  that  was  not  implemented;  the  plan 
detailed  proposed  dates  of  construction  (in  5-year  segments)  over  the  next  30  years,  which  are  used  in  the  construction  of  this  variable.  'Chambers 
of  Commerce  plans'  were  invited  expansion  proposals  by  the  Madras  and  Bombay  Chambers  of  Commerce  in  1883,  which  were  never 
implemented.  'Kennedy  plan'  was  an  early  construction-cost  minimizing  routes  plan  drawn  up  by  India's  chief  engineer  in  1848  (divided  into  highl¬ 
and  low-priorities),  which  was  rejected  in  favor  of  Dalhousie's  direct  routes  plan.  Heteroskedasticity-robust  standard  errors  corrected  for  clustering  at 
the  district  level  are  reported  in  parentheses. 


Table  7:  Railroads  and  Real  Income  Levels  (Step  4)  -  IV  Estimates 


Dependent  variable: 

Estimation  method: 

Railroad  in 
district 
(First  Stage) 

OLS 

(1) 

Railroad  in 
district 
(First  Stage) 

OLS 

(2) 

Log  real  ag. 
income 
(Second 
Stage) 

IV 

(3) 

Log  real  ag. 
income 
(Second 
Stage) 

IV 

(4) 

(Rainfall  in  1876-78  ag.  year  minus  long-run  mean) 

-0.051 

-0.047 

x  (Post-1884  indicator) 

(0.018)*** 

(0.019)** 

Railroad  in  district 

0.184 

0.193 

(0.084)** 

(0.082)** 

Railroad  in  neighboring  district 

-0.056 

-0.052 

(0.034) 

(0.031) 

Rainfall  in  district 

0.013 

1.123 

1.118 

(0.089) 

(0.518)** 

(0.429)** 

Rainfall  in  district  (lagged  1  year) 

-0.003 

0.328 

(0.048) 

(0.294) 

Rainfall  in  district  (lagged  2  years) 

0.009 

0.024 

(0.064) 

(0.182) 

Rainfall  in  district  (lagged  3  years) 

-0.001 

-0.043 

(0.058) 

(0.195) 

(Rainfall  in  severe  famine  ag.  year  minus  long-run  mean) 
x  (Indicator  for  6  years  after  famine  year) 

(Rainfall  in  mild  famine  ag.  year  minus  long-run  mean) 
x  (Indicator  for  6  years  after  famine  year) 

0.015 

(0.034) 

-0.003 

(0.027) 

0.010 

(0.025) 

0.006 

(0.021) 

District  fixed  effects 

YES 

YES 

YES 

YES 

Year  fixed  effects 

YES 

YES 

YES 

YES 

Observations 

14,340 

14,340 

14,340 

14,340 

R-squared 

0.651 

0.650 

0.733 

0.743 

Notes :  Regressions  estimating  equation  (18)  using  real  income  constructed  from  crop-level  data  on  17  principal  agricultural  crops  (listed  in 
Appendix  A),  from  239  districts  in  India,  annualy  from  1870  to  1930.  'Railroad  in  district'  is  a  dummy  variable  whose  value  is  one  if  any  part  of 
the  district  in  question  is  penetrated  by  a  railroad  line.  'Rainfall  in  1876-78  agricultural  year  minus  long-run  mean'  is  the  amount  of  rainfall  (in 
metres)  in  a  district  from  1  May  1876  to  31  April  1878,  minus  the  district's  average  annual  rainfall  in  agricultural  years  from  1870  to  1930. 
'Railroad  in  neighboring  districts'  is  the  variable  'railroad  in  district'  averaged  over  all  districts  within  a  250  km  radius  of  the  district  of 
observation.  'Rainfall  in  district'  is  a  measure  (in  meters)  of  the  amount  of  crop-specific  rainfall  that  fell  in  the  district,  averaged  over  all  17 
crops  using  the  appropriate  weighting  in  equation  (20)  of  the  text.  'Rainfall  in  severe/mild  famine  agricultural  year  minus  long-run  mean'  is 
similar  to  the  variable  defined  above  for  the  1876-78  famine,  but  for  five  other  famines  designated  as  either  'severe'  or  'mild'  as  in  the  text. 
Data  sources  and  construction  are  described  in  full  in  Appendix  A.  Heteroskedasticity-robust  standard  errors  corrected  for  clustering  at  the 
district  level  are  reported  in  parentheses.  ***  indicates  statistically  significantly  different  from  zero  at  the  1%  level;  **  indicates  5%  level;  and  * 
indicates  10%  level. 


Table  8:  Railroads  and  Real  Income  Levels  (Step  4)  -  Bounds  Check 


Dependent  variable:  log  real  agricultural  income  per  acre  (1)  (2) 


Railroad  in  district  0.182 

(0.071)*** 

Railroad  in  neighboring  district  -0.042  -0.037 

(0.020)**  (0.031) 

Railroad  in  district,  built  pre-1883  or  post-1904  0.174 

(0.100)* 

Railroad  in  district,  line  labelled  as  'productive'  0.212 

(0.119) 

Railroad  in  district,  line  labelled  as  'protective'  0.168 

(0.144) 

Railroad  in  district,  line  labelled  as  'productive  and  protective'  0.173 

(0.138) 

Railroad  in  district,  line  labelled  as  'military'  0.204 

(0.197) 


District  fixed  effects  YES  YES 

Year  fixed  effects  YES  YES 

Observations  14,340  14,340 

R-squared _ 0.758 _ 0.760 


Notes :  OLS  regressions  estimating  equation  (18)  using  real  income  constructed  from  crop-level  data  on  17  principal  agricultural  crops  (listed 
in  Appendix  A),  from  239  districts  in  India,  annualy  from  1870  to  1930.  'Railroad  in  district'  is  a  dummy  variable  whose  value  is  one  if  any  part 
of  the  district  in  question  is  penetrated  by  a  railroad  line  (calculated  over  all  years  in  the  sample).  'Railroad  in  neighboring  district'  was  defined 
in  Table  5.  'Railroad  in  district,  built  pre-1883  or  post-1904'  is  a  similar  variable  defined  in  these  time  periods  only,  because  in  these  time 
periods  lines  were  not  designated  according  to  primary  intended  use.  ’Railroad  in  district,  line  labelled  as  X’  is  a  dummy  variable  whose  value 
is  one  if  any  part  of  the  district  in  question  is  penetrated  by  a  line  that  whose  primary  intended  use  was  designated  (between  1883  and  1904, 
when  all  lines  required  such  a  designation)  as  'X'.  The  intended  primary  uses  ’X’  are:  ’productive',  where  line  was  expected  to  be 
commercially  remunerative;  'protective',  where  line  was  intended  to  be  redistributive  towards  lagging  regions;  'productive  and  protective', 
where  the  line  was  intended  to  have  both  of  the  previous  primary  uses;  and  'military'  if  the  line  was  built  for  military/strategic  reasons. 
Heteroskedasticity-robust  standard  errors  corrected  for  clustering  at  the  district  level  are  reported  in  parentheses.  ***  indicates  statistically 
significantly  different  from  zero  at  the  1%  level;  **  indicates  5%  level;  and  *  indicates  10%  level. 


Table  9:  Railroads  and  Real  Income  Volatility  (Step  5) 


Dependent  variable:  Log  real  agricultural  income  per  acre 

(1) 

(2) 

Railroad  in  district 

0.186 

(0.085)** 

0.252 

(0.132)* 

Rainfall  in  district 

1.248 

(0.430)*** 

2.434 

(0.741)*** 

(Railroad  in  district)  x  (Rainfall  in  district) 

-1.184 

(0.482)** 

Railroad  in  neighboring  district 

-0.031 

(0.021)* 

-0.022 

(0.027) 

District  fixed  effects 

YES 

YES 

Year  fixed  effects 

YES 

YES 

Observations 

14,340 

14,340 

R-squared 

0.767 

0.770 

Notes :  OLS  Regressions  estimating  equation  (19)  using  real  income  constructed  from  crop-level  data  on  17  principal  agricultural  crops  (listed 
in  Appendix  A),  from  239  districts  in  India,  annualy  from  1870  to  1930.  'Railroad  in  district'  is  a  dummy  variable  whose  value  is  one  if  any  part 
of  the  district  in  question  is  penetrated  by  a  railroad  line.  'Rainfall  in  district'  is  a  weighted  sum  of  a  district's  crop-specific  rainfall  amounts  (in 
meters),  summed  over  all  17  crops  with  weights  as  suggested  by  my  model,  as  in  equation  (19).  'Railroad  in  neighboring  districts'  was 
defined  in  the  notes  to  Table  5.  Data  sources  and  construction  are  described  in  full  in  Appendix  A.  Heteroskedasticity-robust  standard  errors 
corrected  for  clustering  at  the  district  level  are  reported  in  parentheses.  ***  indicates  statistically  significantly  different  from  zero  at  the  1% 
level;  **  indicates  5%  level;  and  *  indicates  10%  level. 

Table  10:  A  Sufficient  Statistic  for  Railroad  Impact  (Step  6) 


Dependent  variable:  Log  real  agricultural  income  per  acre 

(1) 

(2) 

Railroad  in  district 

0.252 

0.021 

(0.132)* 

(.0096) 

Rainfall  in  district 

2.434 

1.044 

(0.741)*** 

(0.476)** 

(Railroad  in  district)  x  (Rainfall  in  district) 

-1.184 

0.042 

(0.482)** 

(0.64) 

Railroad  in  neighboring  district 

-0.022 

0.003 

(0.027) 

(0.041) 

"Openness",  as  computed  in  model 

-0.942 

(0.152)*** 

District  fixed  effects 

YES 

YES 

Year  fixed  effects 

YES 

YES 

Observations 

14,340 

14,340 

R-squared 

0.770 

0.788 

Notes :  OLS  Regressions  estimating  equation  (19)  in  column  1  and  equation  (21)  in  column  2,  using  real  income  constructed  from  crop- 
level  data  on  17  principal  agricultural  crops  (listed  in  Appendix  A),  from  239  districts  in  India,  annualy  from  1870  to  1930.  'Railroad  in 
district'  is  a  dummy  variable  whose  value  is  one  if  any  part  of  the  district  in  question  is  penetrated  by  a  railroad  line.  'Rainfall  in  district'  is  a 
weighted  sum  of  a  district's  crop-specific  rainfall  amounts  (in  meters),  summed  over  all  17  crops  with  weights  as  suggested  by  my  model, 
as  in  equation  (20)  (where  for  reasons  explained  in  the  text,  the  weights  sum  to  4.6).  'Railroad  in  neighboring  districts'  was  defined  in  the 
notes  to  Table  5.  'Openness'  is  the  share  of  a  district's  expenditure  that  it  buys  from  itself;  this  varible  is  computed  in  the  equilibrium  of  the 
model,  where  the  model  parameters  are  set  to  those  estimated  in  Steps  1  and  2,  and  the  exogenous  variables  (the  transportation  network, 
rainfall,  and  district  land  sizes)  are  as  observed.  Data  sources  and  construction  are  described  in  full  in  Appendix  A.  Heteroskedasticity- 
robust  standard  errors  corrected  for  clustering  at  the  district  level  are  reported  in  parentheses.  ***  indicates  statistically  significantly 
different  from  zero  at  the  1%  level;  **  indicates  5%  level;  and  *  indicates  10%  level. 


